Commit Graph

  • b70ae3ed58 - Improve error-handling - Use new class-model from PublicationsRetriever software. - Set new version. master 2.1.15 Lampros Smyrnaios 2024-12-06 00:23:46 +0200
  • 83e6d761dd - Increase the initial delay for "checkAndDeleteOldFiles" to 54 hours, in order to avoid obsolete checks earlier, since a directory needs to have been last accessed at least 48 hours ago, while some additional hours are needed for the first batch to be finished. - Improve a log-message. Lampros Smyrnaios 2024-12-05 17:56:15 +0200
  • fa7b2da926 Make sure the "already-retrieved" fulltexts are deleted as early as possible. These are the files which have their hash been added in the Database sometime in the past and the Controller has not requested them now. Lampros Smyrnaios 2024-12-04 00:59:00 +0200
  • bb521cc493 Update the logic behind sending the "shutdownReport" to the Controller: - Make sure the Controller is being informed for the about-to-shut-down state of the worker, even if it is being shut-down without the use of the api. - Send the report as late as possible, in order for the Controller to know the exact time the Worker was shutdown. Lampros Smyrnaios 2024-12-04 00:45:22 +0200
  • 6e4f91f20d - Set next version. - Code polishing. - Update dependencies. - Optimize Gradle's configuration. Lampros Smyrnaios 2024-12-04 00:26:06 +0200
  • f05c98de20 Revert back the reduction in the amount of "requiredFreeSpace" needed to be available in order to accept new assignments, which happened in commit: 4af74d4581 Lampros Smyrnaios 2024-12-03 19:54:11 +0200
  • 393c9e037e Add missing import. 2.1.14 Lampros Smyrnaios 2024-11-13 20:02:03 +0200
  • 40ffc572e3 Add missing call. Lampros Smyrnaios 2024-11-13 19:59:48 +0200
  • ef33e91553 - Use newer api form "PublicationsRetriever" software. - Optimize memory-de-allocation. - Optimize Gradle settings. - Set new version. - Update dependencies. Lampros Smyrnaios 2024-11-13 19:10:05 +0200
  • 3eaeff468a Set new version. 2.1.13 Lampros Smyrnaios 2024-10-25 19:04:21 +0300
  • 0b3ee3e16e - Set next version. - Update dependencies. Lampros Smyrnaios 2024-10-24 20:22:40 +0300
  • ea17ec917b - Set new version. - Update dependencies. 2.1.12 Lampros Smyrnaios 2024-06-11 11:59:46 +0300
  • d630f16198 Improve the compression of fulltext files: - Fix not using the big bufferSize it was supposed to use. - Make sure the maximum compression-level is used. Before, the invalid value "bufferSize" was passed as the level, and it is unclear to which real-compression level it was changed to, inside the zstd-library (19 or 22 (only allowed though "ultra mode")), probably to the ultra-level though, as this "switch" seems to be required only through the cli. - Exclude the possibly outdated "commons-compress" transitive dependency from the "publications_retriever" dependency. Lampros Smyrnaios 2024-06-10 18:21:35 +0300
  • 107908a733 - Fix not deleting the "assignments_*" directory, along with the potentially partially created zstd file, in case there was a compression error. - Show the number of files which were successfully compressed, in each batch. - Fix the class-value used in the Logger-initializer, in "FullTextsController". - Improve an error-log. Lampros Smyrnaios 2024-05-30 12:29:02 +0300
  • 4af74d4581 - Reduce the amount of "requiredFreeSpace" needed to be available in order to accept new assignments. - Increase the time to wait before rechecking the available free space, in order to get new assignments, to 30 minutes. - Update dependencies. - Code polishing. Lampros Smyrnaios 2024-05-28 23:10:52 +0300
  • c242f65518 - Improve error-handling in "ConnWithController.postShutdownReportToController()". - Update dependencies. 2.1.11 Lampros Smyrnaios 2024-05-22 16:14:45 +0300
  • b40c72f78f - Fix the process of shutting down the worker, in case the user sends the relevant request, while the worker is stuck in a data-request error-loop. - Upload the updated gradle-wrapper. 2.1.10 Lampros Smyrnaios 2024-04-29 17:08:40 +0300
  • 34407179fc Update Gradle in the install script. 2.1.9 Lampros Smyrnaios 2024-04-26 15:02:50 +0300
  • 7857ce1f05 Update Gradle. Lampros Smyrnaios 2024-04-26 15:02:07 +0300
  • 795d6e7c93 - Update README. - Update dependencies. Lampros Smyrnaios 2024-04-26 13:36:41 +0300
  • 736d0f8526 Add a missing change in logback-spring.xml. Lampros Smyrnaios 2024-02-08 20:04:45 +0200
  • cb736a8d66 Add the Jenkins' build-status badge in README. Lampros Smyrnaios 2024-02-08 19:12:04 +0200
  • 5d7465df3c Add some gradle files to be used by Jenkins. Lampros Smyrnaios 2024-02-08 19:06:54 +0200
  • 24c4a75acf - Use the "RollingFile" logs-appender by default. - Set the next version. Lampros Smyrnaios 2024-02-08 18:51:10 +0200
  • 50d756d582 - Automatically use the latest version of "publications_retriever" software from the Nexus maven-repository. - Update Gradle. - Update License. - Configure the destination of the logs in the "application.properties" file. Lampros Smyrnaios 2024-02-08 18:33:18 +0200
  • 3909104a1b - Update a dependency. - Set new version. 2.1.8 Lampros Smyrnaios 2024-01-15 13:54:12 +0200
  • c4770ee716 Set new version. 2.1.7 Lampros Smyrnaios 2023-12-22 12:40:26 +0200
  • 066d6f665f - Take into account the new "errorMsg" value returned by "LoaderAndChecker.getWasValidAndCouldRetry()". - Update dependencies. Lampros Smyrnaios 2023-12-18 15:17:51 +0200
  • bad9544c58 - Improve error-handling. - Improve a log-message. 2.1.6 Lampros Smyrnaios 2023-11-29 13:49:54 +0200
  • 5a9e7228ae - Set the upcoming version. - Update dependencies. Lampros Smyrnaios 2023-11-27 13:02:33 +0200
  • 9073f56227 Revert the "read-timeout" value back to 1 hour, as there is no that big of a problem with the load of either server, it's a frequent network-"lag" that causes the issue, which is not solved even with 2 hours of waiting. Lampros Smyrnaios 2023-10-31 15:08:57 +0200
  • 69ea5b6d19 - Increase the "ReadTimeout" to 2 hours, as the Worker struggles to get the assignments-data in time. - Revert the change about special handling of the "RestClientException". The exMsg was appearing in a different line, in the logs, and was a "SocketTimeoutException". 2.1.5 Lampros Smyrnaios 2023-10-27 18:39:10 +0300
  • bfa76e9484 - Show the full stacktrace in the weird case of a "RestClientException" without an exception-message. Also, in this case, retry immediately, as there is no long-lasting network problem that requires some time between requests, but most probably a random interruption. - Code polishing. Lampros Smyrnaios 2023-10-27 17:36:54 +0300
  • 10e39d79a4 - Improve a log-message. - Update dependencies. 2.1.4 Lampros Smyrnaios 2023-10-20 17:35:39 +0300
  • 1b45f384a7 - In case a faulty "assignmentsCounter" was given to the "addReportResultToWorker"-endpoint, then return an explanatory error-message along with the HTTP-404 error. - Update Gradle. Lampros Smyrnaios 2023-10-06 15:45:53 +0300
  • 01e378ea66 - Add progress-report-log for assignments-processing. - Code polishing. Lampros Smyrnaios 2023-10-05 12:02:52 +0300
  • 18cc9e0e68 - Improve error-handling in file-compression. - Update dependencies. Lampros Smyrnaios 2023-10-04 16:08:38 +0300
  • 2895668417 - Add LICENSE. - Code polishing. Lampros Smyrnaios 2023-09-14 16:09:20 +0300
  • 49cd0c19c2 - Increase the "hoursToWaitBeforeDeletion" to 48. - Adjust the number and size of log files. 2.1.3 Lampros Smyrnaios 2023-08-31 17:54:07 +0300
  • e85282d35b Update the "addReportResultToWorker"-endpoint to check if the given "assignmentsCounter" was handled by that worker, without considering the related full-texts directory, since that may have been deleted in the meantime. Lampros Smyrnaios 2023-08-31 17:52:52 +0300
  • b579296ada - Code optimization and polishing. - Update dependencies. Lampros Smyrnaios 2023-08-28 16:11:26 +0300
  • dc97b323c9 - Show a warning, if the "numOfUnretrievedFiles" is over 50. - Delete gradle .zip file after installation. - Code polishing. 2.1.2 Lampros Smyrnaios 2023-08-04 15:33:48 +0300
  • 088cf73b30 - Update dependencies. - Code optimization and polishing. 2.1.1 Lampros Smyrnaios 2023-07-27 17:46:17 +0300
  • 952bf7c035 - Update dependencies. - Code polishing. 2.1 Lampros Smyrnaios 2023-07-06 13:22:09 +0300
  • 33df46f6f5 - Improve README. - Update and cleanup dependencies. - Code polishing. Lampros Smyrnaios 2023-06-22 12:47:36 +0300
  • 9c897b8bf4 - Make use of the new Normalizer utilized by the PublicationRetriever plugin. - Code polishing. Lampros Smyrnaios 2023-06-10 02:40:45 +0300
  • 2aedae2367 - In case a serious error happened while processing the assignments, instead of shutting down immediately, now the Worker shuts down the executor service, registers that it will shut down soon and waits for the Controller to retrieve the already downloaded full-text files. - In case the full-texts' subdirectory could not be created, then terminate the "handleAssignment" method immediately. No posting of a faulty workerReport to the Controller should happen. - Code polishing. Lampros Smyrnaios 2023-05-31 15:25:36 +0300
  • 4a95826f58 - Avoid processing the assignments, for which the assignments_full-texts subdirectory cannot be created. - Avoid a double-log. Lampros Smyrnaios 2023-05-31 02:27:24 +0300
  • 7f3ca80959 Bypass url-canonicalization for urls containing certain uncommon characters which cause the urls to get rejected. Lampros Smyrnaios 2023-05-30 19:45:14 +0300
  • a9b1b20a51 - Prevent running out of space, by checking the available free space and stalling the acquisition of new assignments until more free space becomes available. - Fix missing change. Lampros Smyrnaios 2023-05-30 17:58:29 +0300
  • 84f29ea7e0 Update versioning. Lampros Smyrnaios 2023-05-30 15:22:20 +0300
  • 0908dcab8a Use a single "restTemplate" object, with the same timeouts (a bit increased from the old requestRestTemplate, to account for a possible overloaded Controller), since we no longer need to wait for hours until the workerReport is processed by the Controller. Lampros Smyrnaios 2023-05-29 14:15:55 +0300
  • 2b69733912 - Increase the test-delays of the scheduled tasks. - Update dependencies. 2.0 Lampros Smyrnaios 2023-05-29 12:45:43 +0300
  • f57314908a - Improve elapsed time precision for the "lastModified" metadata of the assignments-fulltext subDirectories. - Code polishing. Lampros Smyrnaios 2023-05-25 00:37:44 +0300
  • 1bf27a5a4e - Fix a bug, which caused the old full-text files to not be deleted. - Reduce the "InitialDelay" for the "checkIfShouldShutdown" scheduler. Lampros Smyrnaios 2023-05-24 16:47:53 +0300
  • 0ca02f3587 Change the delay values of scheduledTasks to production ones. Lampros Smyrnaios 2023-05-24 13:56:20 +0300
  • bfa569685a - Use the "POST" method for shutdown and cancelShutdown requests. - Polish some messages. Lampros Smyrnaios 2023-05-23 22:24:49 +0300
  • 9fdaa9503b - Delete any left-over full-texts after 36 hours. - Upon shutting down, post a "shutdownReport" to the Controller. Lampros Smyrnaios 2023-05-23 22:22:57 +0300
  • 903032f454 - After a WorkerReport has been sent, ask for new assignments immediately. So, the Worker does not have to wait for hours for the Controller to check for duplicate files in the DB, retrieve and upload the full-texts and insert the records to the DB. - Special care is taken to delete the delivered full-texts as soon as possible. - Write the workerReport to a json-file, in case something goes wrong, and keep it until the Controller notifies the Worker that the processing was successful. Lampros Smyrnaios 2023-05-23 22:19:41 +0300
  • 9cb43b3d94 - Improve startup speed, by using a faster remote server to get the host's machine public IP. This also reduces the risk of not being able to get the public IP at all. - Set the App to gracefully shut down the WebServer and wait up to 2 minutes. - Increase the waiting time for the "PublicationsRetriever.executor" to shut down, to 2 minutes. Lampros Smyrnaios 2023-05-23 20:17:58 +0300
  • 4d90846261 - In case the specified "controllerIP" is actually a domain-name, find its IP-address, so that a proper IP-to-IP comparison can be performed and the "securityChecks" can pass. - Increase the "read-timeout" when searching for the host's machine public-IP. - Update dependencies. - Code polishing. Lampros Smyrnaios 2023-05-22 21:25:22 +0300
  • bd0ead816d Make the value of time-out for "restTemplateForReport", to scale along the "maxAssignmentsLimitPerBatch". Lampros Smyrnaios 2023-05-16 19:08:59 +0300
  • 93d1aa9588 - Fix a missing change. - Add todo. Lampros Smyrnaios 2023-05-15 13:41:53 +0300
  • cc55354e73 Show the worker-id when the worker starts. Lampros Smyrnaios 2023-05-15 13:22:55 +0300
  • 714938531b - Add the time-zone in the logs. - Code polishing. Lampros Smyrnaios 2023-05-11 03:14:56 +0300
  • 29a54f0b30 Remove the "shutDownOrCancelCode" from security checks, since we have an IP whitelisting mechanism in place. Lampros Smyrnaios 2023-05-03 15:15:46 +0300
  • 4eac7c5c66 Fix typo in property's name. Lampros Smyrnaios 2023-04-29 18:10:35 +0300
  • 0ea7bccadb Leave the Max-Heap-Size to 8Gb, we assume that enough swap space will be available on the host. We can still override the max-heap-size if desired. Lampros Smyrnaios 2023-04-29 17:55:03 +0300
  • d5a997ad3d Use restTemplates with different read timeouts depending on the operation. For the assignments-request we need a shorter read timeout, than the one we need for the worker-report. This guarantees that the connection does not hungs for so long, when the Controller crashes before sending the assignments. Lampros Smyrnaios 2023-04-29 17:24:16 +0300
  • 53ab51922a Allow shutdown requests from the Controller. Lampros Smyrnaios 2023-04-28 23:46:39 +0300
  • fcd80a8f3f Add the "./createSwapStorage.sh" script. Lampros Smyrnaios 2023-04-28 20:55:58 +0300
  • 7b7dd59b57 - Increase the "max_heap_size". - Update a dependency. - Update README.md Lampros Smyrnaios 2023-04-28 19:37:12 +0300
  • ec4d084972 Reduce memory usage. Lampros Smyrnaios 2023-04-28 17:59:36 +0300
  • 0ba15dd31a Increase the "requestReadTimeoutDuration" to 10 hours, as the number of full-texts to be transferred to the Controller keeps getting larger. Lampros Smyrnaios 2023-04-26 15:08:46 +0300
  • 344bc46e08 Update Gradle. Lampros Smyrnaios 2023-04-22 06:44:04 +0300
  • 0997558347 Update dependencies. Lampros Smyrnaios 2023-04-20 15:39:15 +0300
  • 796e46bc99 Update dependencies. Lampros Smyrnaios 2023-03-27 19:44:49 +0300
  • 839a797124 - Improve performance of full-texts transferring to the Controller, by preloading some bytes for faster response to the Controller's read requests. - Optimize directories-creation process by eliminating the additive check for existence, as that check already takes place inside the "mkdirs()" method. - Remove the obsolete code which in case the specific assignments' subdirectory failed to be created, then a different base-dir was used instead. Since the user-defined baseDir is already been successfully created upon initialization, any problem on creating subdirectories inside that base-directory will most likely persist even when changing the base directory. Additionally, even if the subdirectory with the changed base-directory succeeded, the "FullTextsController.getFullTexts()" method would not use it, resulting in errors. - Code polishing. Lampros Smyrnaios 2023-03-08 13:12:17 +0200
  • 4da54e7a7d - Show a warning, in case the number of archived files is different from the number of requested files. - Code polishing. - Update Gradle. Lampros Smyrnaios 2023-03-07 16:25:10 +0200
  • ec09ecc7ff - Refactor and Spring-ify the File-storage initialization process. - Fix the problematic file-storage-path (it could not be used when the Controller was requesting the full-texts), which was produced when the user-defined path could not be created. Lampros Smyrnaios 2023-03-07 16:21:32 +0200
  • ba989484e4 Improve performance when archiving and compressing the full-texts. Lampros Smyrnaios 2023-03-02 17:47:58 +0200
  • ff4fd3d289 - Show the elapsed time for each assignments-request to be processed by the Worker. - Update dependencies. Lampros Smyrnaios 2023-03-02 17:34:44 +0200
  • 66d3f7bcb2 - Show a warning, in case the number of results is different from the number of the assignments (due to missing / double logging). - Update Spring. Lampros Smyrnaios 2023-02-24 23:27:02 +0200
  • 81b61b530f Drastically improve performance by applying a pre-processing algorithm for the assignments-list to open some "space" between assignments which have the same domain, which in return, causes the threads to block less during execution. (The threads block, due to the mandatory "politeness-delay" before reconnecting with the same domain, in order to avoid overloading the remote servers.) Lampros Smyrnaios 2023-02-24 23:23:37 +0200
  • 84a37bd4b7 - Handle the case, where an instance of a urlReport record (having the same id and sourceUrl), may have failed to give a docUrl, due to en error, even if another instance gives the docUrl and the docFile. The absence of that handling could lead to a record-instance, being assigned a "fileLocation" which was actually an error-message (comment), and as a result the real "fileLocation" would have never been reached to be assigned, so the payload would be lost. - Improve exceptions-handling. Lampros Smyrnaios 2023-02-21 15:22:49 +0200
  • 9888349bef Update Gradle. Lampros Smyrnaios 2023-02-20 19:14:16 +0200
  • 0dd2b6c46f Rename "getFullTextsImproved"-endpoint to simply "getFullTexts", now that this is stable. Lampros Smyrnaios 2023-02-16 14:23:47 +0200
  • 0626e85894 Update dependencies. Lampros Smyrnaios 2023-02-15 16:18:33 +0200
  • 13f56d16c0 Inform the user, if a previous "shutdownWorker"-request has been given, in "GeneralController.shutdownWorkerGracefully()"-endpoint. Lampros Smyrnaios 2023-02-01 16:41:40 +0200
  • b98ea92dec Update/improve documentation. Lampros Smyrnaios 2023-01-27 14:27:57 +0200
  • 24b52fba63 - Refactor the initialization and configuration process and Spring-ify the project. - Update Spring dependency. Lampros Smyrnaios 2023-01-25 18:33:49 +0200
  • d6ff62d2ef Update the "installAndRun.sh" script: - Add the ability to build and run the app without re-installing the PublicationsRetriever library. This is useful when trying a non-published version of that library. - Fix a wrong variable-name. Lampros Smyrnaios 2023-01-20 01:59:26 +0200
  • bd0d9eb36f - Delete the transferred full-texts as soon as possible, in order to mitigate the "No space left on device"-error, which may appear, in case we have some very large files. - Use the new "GenericUtils.clearBlockingData()" method from the "PublicationsRetriever" library. - Remove the deprecated "getMultipleFullTexts"-endpoint, along with the Zip-related code. Lampros Smyrnaios 2023-01-18 16:55:59 +0200
  • 7dd5719bff - Update a method-call, to reflect the latest changes in the "PublicationsRetriever"-software. - Add TODOs. Lampros Smyrnaios 2023-01-17 18:25:49 +0200
  • c283cb4365 Improve exception-handling in "AssignmentsHandler.postWorkerReport()". Lampros Smyrnaios 2023-01-16 15:22:32 +0200
  • d96d0c68cd Make sure the "responseCode" is "200-OK", before trying to get the InputStream in "UriBuilder.getPublicIP()". Lampros Smyrnaios 2023-01-11 16:02:31 +0200
  • fd62ac567e - Add a new endpoint "getFullTextsImproved" which uses Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits on compression rate and speed. - Remove some dependencies. Lampros Smyrnaios 2023-01-09 15:48:30 +0200
  • 778dc6e25c - Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions. - Show the number of assignments which are requested from the Controller, in the log-message. - Update Spring. Lampros Smyrnaios 2023-01-03 18:43:26 +0200
  • 378db2ff2f - Add an existence-check for the "publications_retriever"-JAR, before trying to make a backup, inside "installAndRun.sh". - Add a final logging message, right before the app shuts down. Lampros Smyrnaios 2022-12-15 14:15:24 +0200
  • 8c1daadad0 - Increase the "requestReadTimeoutDuration" to 5 hours. - Improve gradle's performance. Lampros Smyrnaios 2022-12-12 17:49:14 +0200