b40c72f78f- Fix the process of shutting down the worker, in case the user sends the relevant request, while the worker is stuck in a data-request error-loop. - Upload the updated gradle-wrapper.
master
2.1.10
Lampros Smyrnaios2024-04-29 17:08:40 +0300
24c4a75acf- Use the "RollingFile" logs-appender by default. - Set the next version.Lampros Smyrnaios2024-02-08 18:51:10 +0200
50d756d582- Automatically use the latest version of "publications_retriever" software from the Nexus maven-repository. - Update Gradle. - Update License. - Configure the destination of the logs in the "application.properties" file.Lampros Smyrnaios2024-02-08 18:33:18 +0200
066d6f665f- Take into account the new "errorMsg" value returned by "LoaderAndChecker.getWasValidAndCouldRetry()". - Update dependencies.Lampros Smyrnaios2023-12-18 15:17:51 +0200
9073f56227Revert the "read-timeout" value back to 1 hour, as there is no that big of a problem with the load of either server, it's a frequent network-"lag" that causes the issue, which is not solved even with 2 hours of waiting.Lampros Smyrnaios2023-10-31 15:08:57 +0200
69ea5b6d19- Increase the "ReadTimeout" to 2 hours, as the Worker struggles to get the assignments-data in time. - Revert the change about special handling of the "RestClientException". The exMsg was appearing in a different line, in the logs, and was a "SocketTimeoutException".
2.1.5
Lampros Smyrnaios2023-10-27 18:39:10 +0300
bfa76e9484- Show the full stacktrace in the weird case of a "RestClientException" without an exception-message. Also, in this case, retry immediately, as there is no long-lasting network problem that requires some time between requests, but most probably a random interruption. - Code polishing.Lampros Smyrnaios2023-10-27 17:36:54 +0300
1b45f384a7- In case a faulty "assignmentsCounter" was given to the "addReportResultToWorker"-endpoint, then return an explanatory error-message along with the HTTP-404 error. - Update Gradle.Lampros Smyrnaios2023-10-06 15:45:53 +0300
49cd0c19c2- Increase the "hoursToWaitBeforeDeletion" to 48. - Adjust the number and size of log files.
2.1.3
Lampros Smyrnaios2023-08-31 17:54:07 +0300
e85282d35bUpdate the "addReportResultToWorker"-endpoint to check if the given "assignmentsCounter" was handled by that worker, without considering the related full-texts directory, since that may have been deleted in the meantime.Lampros Smyrnaios2023-08-31 17:52:52 +0300
dc97b323c9- Show a warning, if the "numOfUnretrievedFiles" is over 50. - Delete gradle .zip file after installation. - Code polishing.
2.1.2
Lampros Smyrnaios2023-08-04 15:33:48 +0300
9c897b8bf4- Make use of the new Normalizer utilized by the PublicationRetriever plugin. - Code polishing.Lampros Smyrnaios2023-06-10 02:40:45 +0300
2aedae2367- In case a serious error happened while processing the assignments, instead of shutting down immediately, now the Worker shuts down the executor service, registers that it will shut down soon and waits for the Controller to retrieve the already downloaded full-text files. - In case the full-texts' subdirectory could not be created, then terminate the "handleAssignment" method immediately. No posting of a faulty workerReport to the Controller should happen. - Code polishing.Lampros Smyrnaios2023-05-31 15:25:36 +0300
4a95826f58- Avoid processing the assignments, for which the assignments_full-texts subdirectory cannot be created. - Avoid a double-log.Lampros Smyrnaios2023-05-31 02:27:24 +0300
7f3ca80959Bypass url-canonicalization for urls containing certain uncommon characters which cause the urls to get rejected.Lampros Smyrnaios2023-05-30 19:45:14 +0300
a9b1b20a51- Prevent running out of space, by checking the available free space and stalling the acquisition of new assignments until more free space becomes available. - Fix missing change.Lampros Smyrnaios2023-05-30 17:58:29 +0300
0908dcab8aUse a single "restTemplate" object, with the same timeouts (a bit increased from the old requestRestTemplate, to account for a possible overloaded Controller), since we no longer need to wait for hours until the workerReport is processed by the Controller.Lampros Smyrnaios2023-05-29 14:15:55 +0300
2b69733912- Increase the test-delays of the scheduled tasks. - Update dependencies.
2.0
Lampros Smyrnaios2023-05-29 12:45:43 +0300
f57314908a- Improve elapsed time precision for the "lastModified" metadata of the assignments-fulltext subDirectories. - Code polishing.Lampros Smyrnaios2023-05-25 00:37:44 +0300
1bf27a5a4e- Fix a bug, which caused the old full-text files to not be deleted. - Reduce the "InitialDelay" for the "checkIfShouldShutdown" scheduler.Lampros Smyrnaios2023-05-24 16:47:53 +0300
0ca02f3587Change the delay values of scheduledTasks to production ones.Lampros Smyrnaios2023-05-24 13:56:20 +0300
bfa569685a- Use the "POST" method for shutdown and cancelShutdown requests. - Polish some messages.Lampros Smyrnaios2023-05-23 22:24:49 +0300
9fdaa9503b- Delete any left-over full-texts after 36 hours. - Upon shutting down, post a "shutdownReport" to the Controller.Lampros Smyrnaios2023-05-23 22:22:57 +0300
903032f454- After a WorkerReport has been sent, ask for new assignments immediately. So, the Worker does not have to wait for hours for the Controller to check for duplicate files in the DB, retrieve and upload the full-texts and insert the records to the DB. - Special care is taken to delete the delivered full-texts as soon as possible. - Write the workerReport to a json-file, in case something goes wrong, and keep it until the Controller notifies the Worker that the processing was successful.Lampros Smyrnaios2023-05-23 22:19:41 +0300
9cb43b3d94- Improve startup speed, by using a faster remote server to get the host's machine public IP. This also reduces the risk of not being able to get the public IP at all. - Set the App to gracefully shut down the WebServer and wait up to 2 minutes. - Increase the waiting time for the "PublicationsRetriever.executor" to shut down, to 2 minutes.Lampros Smyrnaios2023-05-23 20:17:58 +0300
4d90846261- In case the specified "controllerIP" is actually a domain-name, find its IP-address, so that a proper IP-to-IP comparison can be performed and the "securityChecks" can pass. - Increase the "read-timeout" when searching for the host's machine public-IP. - Update dependencies. - Code polishing.Lampros Smyrnaios2023-05-22 21:25:22 +0300
bd0ead816dMake the value of time-out for "restTemplateForReport", to scale along the "maxAssignmentsLimitPerBatch".Lampros Smyrnaios2023-05-16 19:08:59 +0300
29a54f0b30Remove the "shutDownOrCancelCode" from security checks, since we have an IP whitelisting mechanism in place.Lampros Smyrnaios2023-05-03 15:15:46 +0300
0ea7bccadbLeave the Max-Heap-Size to 8Gb, we assume that enough swap space will be available on the host. We can still override the max-heap-size if desired.Lampros Smyrnaios2023-04-29 17:55:03 +0300
d5a997ad3dUse restTemplates with different read timeouts depending on the operation. For the assignments-request we need a shorter read timeout, than the one we need for the worker-report. This guarantees that the connection does not hungs for so long, when the Controller crashes before sending the assignments.Lampros Smyrnaios2023-04-29 17:24:16 +0300
0ba15dd31aIncrease the "requestReadTimeoutDuration" to 10 hours, as the number of full-texts to be transferred to the Controller keeps getting larger.Lampros Smyrnaios2023-04-26 15:08:46 +0300
839a797124- Improve performance of full-texts transferring to the Controller, by preloading some bytes for faster response to the Controller's read requests. - Optimize directories-creation process by eliminating the additive check for existence, as that check already takes place inside the "mkdirs()" method. - Remove the obsolete code which in case the specific assignments' subdirectory failed to be created, then a different base-dir was used instead. Since the user-defined baseDir is already been successfully created upon initialization, any problem on creating subdirectories inside that base-directory will most likely persist even when changing the base directory. Additionally, even if the subdirectory with the changed base-directory succeeded, the "FullTextsController.getFullTexts()" method would not use it, resulting in errors. - Code polishing.Lampros Smyrnaios2023-03-08 13:12:17 +0200
4da54e7a7d- Show a warning, in case the number of archived files is different from the number of requested files. - Code polishing. - Update Gradle.Lampros Smyrnaios2023-03-07 16:25:10 +0200
ec09ecc7ff- Refactor and Spring-ify the File-storage initialization process. - Fix the problematic file-storage-path (it could not be used when the Controller was requesting the full-texts), which was produced when the user-defined path could not be created.Lampros Smyrnaios2023-03-07 16:21:32 +0200
ba989484e4Improve performance when archiving and compressing the full-texts.Lampros Smyrnaios2023-03-02 17:47:58 +0200
ff4fd3d289- Show the elapsed time for each assignments-request to be processed by the Worker. - Update dependencies.Lampros Smyrnaios2023-03-02 17:34:44 +0200
66d3f7bcb2- Show a warning, in case the number of results is different from the number of the assignments (due to missing / double logging). - Update Spring.Lampros Smyrnaios2023-02-24 23:27:02 +0200
81b61b530fDrastically improve performance by applying a pre-processing algorithm for the assignments-list to open some "space" between assignments which have the same domain, which in return, causes the threads to block less during execution. (The threads block, due to the mandatory "politeness-delay" before reconnecting with the same domain, in order to avoid overloading the remote servers.)Lampros Smyrnaios2023-02-24 23:23:37 +0200
84a37bd4b7- Handle the case, where an instance of a urlReport record (having the same id and sourceUrl), may have failed to give a docUrl, due to en error, even if another instance gives the docUrl and the docFile. The absence of that handling could lead to a record-instance, being assigned a "fileLocation" which was actually an error-message (comment), and as a result the real "fileLocation" would have never been reached to be assigned, so the payload would be lost. - Improve exceptions-handling.Lampros Smyrnaios2023-02-21 15:22:49 +0200
13f56d16c0Inform the user, if a previous "shutdownWorker"-request has been given, in "GeneralController.shutdownWorkerGracefully()"-endpoint.Lampros Smyrnaios2023-02-01 16:41:40 +0200
24b52fba63- Refactor the initialization and configuration process and Spring-ify the project. - Update Spring dependency.Lampros Smyrnaios2023-01-25 18:33:49 +0200
d6ff62d2efUpdate the "installAndRun.sh" script: - Add the ability to build and run the app without re-installing the PublicationsRetriever library. This is useful when trying a non-published version of that library. - Fix a wrong variable-name.Lampros Smyrnaios2023-01-20 01:59:26 +0200
bd0d9eb36f- Delete the transferred full-texts as soon as possible, in order to mitigate the "No space left on device"-error, which may appear, in case we have some very large files. - Use the new "GenericUtils.clearBlockingData()" method from the "PublicationsRetriever" library. - Remove the deprecated "getMultipleFullTexts"-endpoint, along with the Zip-related code.Lampros Smyrnaios2023-01-18 16:55:59 +0200
7dd5719bff- Update a method-call, to reflect the latest changes in the "PublicationsRetriever"-software. - Add TODOs.Lampros Smyrnaios2023-01-17 18:25:49 +0200
c283cb4365Improve exception-handling in "AssignmentsHandler.postWorkerReport()".Lampros Smyrnaios2023-01-16 15:22:32 +0200
d96d0c68cdMake sure the "responseCode" is "200-OK", before trying to get the InputStream in "UriBuilder.getPublicIP()".Lampros Smyrnaios2023-01-11 16:02:31 +0200
fd62ac567e- Add a new endpoint "getFullTextsImproved" which uses Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits on compression rate and speed. - Remove some dependencies.Lampros Smyrnaios2023-01-09 15:48:30 +0200
778dc6e25c- Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions. - Show the number of assignments which are requested from the Controller, in the log-message. - Update Spring.Lampros Smyrnaios2023-01-03 18:43:26 +0200
378db2ff2f- Add an existence-check for the "publications_retriever"-JAR, before trying to make a backup, inside "installAndRun.sh". - Add a final logging message, right before the app shuts down.Lampros Smyrnaios2022-12-15 14:15:24 +0200
8c1daadad0- Increase the "requestReadTimeoutDuration" to 5 hours. - Improve gradle's performance.Lampros Smyrnaios2022-12-12 17:49:14 +0200
d37cd738a0Refactor the full-texts deletion process to reduce storage space and complexity: - Delete the assignments-batch full-texts after the whole procedure (for each assignments-batch) is finished, either successfully or not. - Do not check for remaining files, when the Worker shuts down, since, in case of problematic handling the files are deleted anyway.Lampros Smyrnaios2022-12-07 12:29:05 +0200
326af0f12d- Return a success-message in the response-body, of the "shutdownWorkerGracefully" and "cancelShutdownWorkerGracefully" endpoints. - Apply the checks for the "totalZipBatches" param, before the Worker-related checks, in "FullTextsController.getMultipleFullTexts()" - Show the Heap-sizes in megabytes.Lampros Smyrnaios2022-12-05 21:58:16 +0200
5f48f72f06- Add handling for the case, when the Controller could not retrieve any assignments from the database (without an error). - Improve exception handling. - Remove obsolete code.Lampros Smyrnaios2022-12-05 16:47:15 +0200
182d6153d4- Set some optimization settings for gradle. - Fix error-handling in "installAndRun.sh". - Update dependencies.Lampros Smyrnaios2022-11-30 16:25:57 +0200
01f12e2fe2- Align with "PublicationsRetriever's" updated "couldRetry" and "wasValid" logic. - Update dependencies.Lampros Smyrnaios2022-11-11 16:02:20 +0200
90a69686cf- When the Worker is about to shut-down, after deleting all the handled assignments' files, check for remaining full-texts in the local storage and warn the user. If no remaining files were found, then delete the parent fulltexts' directory. - Polish the code.Lampros Smyrnaios2022-11-02 02:27:04 +0200
6450a4b8ac- Add check for ZERO value of "totalZipBatches", in "FullTextsController.getMultipleFullTexts()". - Improve or comment-out some log-messages. - Disable the empty SpringBootTest, as it caused building problems.Lampros Smyrnaios2022-10-06 16:59:45 +0300
4b85b092feHandle the new "HttpStatus.MULTI_STATUS"-response from the Controller, inside "AssignmentsHandler.postWorkerReport()".Lampros Smyrnaios2022-09-28 22:41:43 +0300
b051e10fd3- Fix a bug, causing the domainAndPath-tracking data to be deleted after every batch, after the initial threshold was reached. Now the thresholds increase, along the processed id-urls, in order to clear data, e.g. every 300_000 processed id-urls, as intended. - Use different thresholds for clearing just the "domainAndPath"-blocking-data and all-tracking-data.Lampros Smyrnaios2022-09-28 19:10:01 +0300
373bfa810b- Apply a "shouldShutdownWorker"-check in "ScheduledTasks.handleNewAssignments()", when there was a "connection-error" in the previous request. This makes sure that the Worker will honor the user's shut down request, even if it's "stuck" in a connection-error loop. - Optimize the input-streams creation in the "FullTextsController".Lampros Smyrnaios2022-09-12 16:48:44 +0300
d73a99b1c0- Increase the security of "shutdownWorker" and "cancelShutdownWorker" endpoints, by only allowing the requests, which come from the same machine. - Update the "UriBuilder.java" to be able to take the running port of the server, in case the port-number was initially set to "random" (0).Lampros Smyrnaios2022-09-12 16:38:44 +0300
25070d7aba- Lower the thresholds for how often to clear the data-structures. - Clear the "ConnSupportUtils.domainsWithConnectionData" data-structure, after each batch. - Move the code for handling the "CookieStore" inside the "PublicationsRetrieverPlugin", as it is more related to that.Lampros Smyrnaios2022-07-04 18:42:05 +0300
5035094e44- Move the "shutdownOrCancelCode" input in the "inputDataFile" provided by the user, for convenience and to be able to make this "auth-code" mandatory. Previously, it was optional and the app could not be made to stop in a normal-manner, if this code was not provided. - Improve the instructions and the error-messages for the "inputDataFile".Lampros Smyrnaios2022-06-28 16:00:11 +0300
d91732bc16- Add deletion, of the cookies in the newly-supported CookieManager, after each batch. - Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated. - Update dependencies. - Code cleanup.Lampros Smyrnaios2022-06-27 17:58:02 +0300
26cbb83b51- Add the "shutdownWorker"-endpoint to accept requests for shutting-down the Worker, gracefully, after it completes its current work (including sending the publications-files to the Controller). A user-defined "auth-code" is required. - Add the "cancelShutdownWorker"-endpoint to cancel a previous "shutdownWorker"-request. A user-defined "auth-code" is required.Lampros Smyrnaios2022-06-22 18:53:27 +0300