Commit Graph

24 Commits

Author SHA1 Message Date
Lampros Smyrnaios dc97b323c9 - Show a warning, if the "numOfUnretrievedFiles" is over 50.
- Delete gradle .zip file after installation.
- Code polishing.
2023-08-04 15:33:48 +03:00
Lampros Smyrnaios f57314908a - Improve elapsed time precision for the "lastModified" metadata of the assignments-fulltext subDirectories.
- Code polishing.
2023-05-25 00:37:44 +03:00
Lampros Smyrnaios 903032f454 - After a WorkerReport has been sent, ask for new assignments immediately. So, the Worker does not have to wait for hours for the Controller to check for duplicate files in the DB, retrieve and upload the full-texts and insert the records to the DB.
- Special care is taken to delete the delivered full-texts as soon as possible.
- Write the workerReport to a json-file, in case something goes wrong, and keep it until the Controller notifies the Worker that the processing was successful.
2023-05-23 22:19:41 +03:00
Lampros Smyrnaios 714938531b - Add the time-zone in the logs.
- Code polishing.
2023-05-11 03:14:56 +03:00
Lampros Smyrnaios ec4d084972 Reduce memory usage. 2023-04-28 17:59:36 +03:00
Lampros Smyrnaios 839a797124 - Improve performance of full-texts transferring to the Controller, by preloading some bytes for faster response to the Controller's read requests.
- Optimize directories-creation process by eliminating the additive check for existence, as that check already takes place inside the "mkdirs()" method.
- Remove the obsolete code which in case the specific assignments' subdirectory failed to be created, then a different base-dir was used instead. Since the user-defined baseDir is already been successfully created upon initialization, any problem on creating subdirectories inside that base-directory will most likely persist even when changing the base directory. Additionally, even if the subdirectory with the changed base-directory succeeded, the "FullTextsController.getFullTexts()" method would not use it, resulting in errors.
- Code polishing.
2023-03-08 13:12:17 +02:00
Lampros Smyrnaios ec09ecc7ff - Refactor and Spring-ify the File-storage initialization process.
- Fix the problematic file-storage-path (it could not be used when the Controller was requesting the full-texts), which was produced when the user-defined path could not be created.
2023-03-07 16:21:32 +02:00
Lampros Smyrnaios 0dd2b6c46f Rename "getFullTextsImproved"-endpoint to simply "getFullTexts", now that this is stable. 2023-02-16 14:23:47 +02:00
Lampros Smyrnaios 24b52fba63 - Refactor the initialization and configuration process and Spring-ify the project.
- Update Spring dependency.
2023-01-25 18:33:49 +02:00
Lampros Smyrnaios bd0d9eb36f - Delete the transferred full-texts as soon as possible, in order to mitigate the "No space left on device"-error, which may appear, in case we have some very large files.
- Use the new "GenericUtils.clearBlockingData()" method from the "PublicationsRetriever" library.
- Remove the deprecated "getMultipleFullTexts"-endpoint, along with the Zip-related code.
2023-01-18 16:55:59 +02:00
Lampros Smyrnaios fd62ac567e - Add a new endpoint "getFullTextsImproved" which uses Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits on compression rate and speed.
- Remove some dependencies.
2023-01-09 15:48:30 +02:00
Lampros Smyrnaios 778dc6e25c - Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions.
- Show the number of assignments which are requested from the Controller, in the log-message.
- Update Spring.
2023-01-03 18:43:26 +02:00
Lampros Smyrnaios 6c17e86c70 Code polishing. 2022-12-09 12:53:08 +02:00
Lampros Smyrnaios d37cd738a0 Refactor the full-texts deletion process to reduce storage space and complexity:
- Delete the assignments-batch full-texts after the whole procedure (for each assignments-batch) is finished, either successfully or not.
- Do not check for remaining files, when the Worker shuts down, since, in case of problematic handling the files are deleted anyway.

The full-texts are not needed to be kept, in case of an error, since the Controller will reassign the non-downloaded id-url records to some worker (maybe different) and these files will be downloaded again and handled there.

Also, change the "assignmentsNumsHandled" to hold data only for assignments which are handled all the way, including the upload of the full-texts from the Controller and also the insertion of the WorkerReport to the database.
2022-12-07 12:29:05 +02:00
Lampros Smyrnaios 326af0f12d - Return a success-message in the response-body, of the "shutdownWorkerGracefully" and "cancelShutdownWorkerGracefully" endpoints.
- Apply the checks for the "totalZipBatches" param, before the Worker-related checks, in "FullTextsController.getMultipleFullTexts()"
- Show the Heap-sizes in megabytes.
2022-12-05 21:58:16 +02:00
Lampros Smyrnaios 90a69686cf - When the Worker is about to shut-down, after deleting all the handled assignments' files, check for remaining full-texts in the local storage and warn the user. If no remaining files were found, then delete the parent fulltexts' directory.
- Polish the code.
2022-11-02 02:27:04 +02:00
Lampros Smyrnaios 6450a4b8ac - Add check for ZERO value of "totalZipBatches", in "FullTextsController.getMultipleFullTexts()".
- Improve or comment-out some log-messages.
- Disable the empty SpringBootTest, as it caused building problems.
2022-10-06 16:59:45 +03:00
Lampros Smyrnaios 373bfa810b - Apply a "shouldShutdownWorker"-check in "ScheduledTasks.handleNewAssignments()", when there was a "connection-error" in the previous request. This makes sure that the Worker will honor the user's shut down request, even if it's "stuck" in a connection-error loop.
- Optimize the input-streams creation in the "FullTextsController".
2022-09-12 16:48:44 +03:00
Lampros Smyrnaios c46c8c448a - Upgrade the zip-file delivery by using the "InputStreamResource". This way is more reliable, have better performance and uses less memory.
- Use the "InputStreamResource" also in "get(single)FullText"-endpoint, in order to avoid loading a big full-text file in memory.
- Decrease the system-reserved memory by 128 MB.
- Fix path-variable regexes for "getFullText"-endpoint.
- Optimize imports.
- Code cleanup.
2021-12-17 08:25:54 +02:00
Lampros Smyrnaios 859f850f56 - Improve Zip-file delivery and significantly decrease memory consumption, by streaming it, instead of loading the whole file in memory before sending it to the Controller.
- Fix not closing the inputStream of the zip-file.
- Count and show the number of files which were zipped in each batch.
2021-12-13 15:29:03 +02:00
Lampros Smyrnaios fd5b56e3c6 - Allow the user to set the "maxAssignmentsLimitPerBatch" value.
- Set increased lower and upper limits for the Java Heap Size.
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
- Improve two log-messages.
2021-12-07 00:52:40 +02:00
Lampros Smyrnaios ce49bff50e - Reduce memory consumption when loading a zipFile.
- Check whether the "zipBatchCounter" is larger than the "totalZipBatches".
- Improve the "failed tasks" log-message.
2021-12-03 16:29:16 +02:00
Lampros Smyrnaios 018326eedd - Optimize the "FileZipper.zipMultipleFilesAndGetZip()" and "FileZipper.zipAFile()" methods.
- Improve the "getMultipleFullTexts"-endpoint. Check if the "fileNamesWithExtensions"-list is empty. Check if the baseDir for the fullTexts of a given assignments-counter is missing.
- Optimize the "PublicationsRetrieverPlugin.processAssignments()" method.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the heap size, in the beginning.
- Update Gradle.
- Code cleanup.
2021-12-03 04:09:40 +02:00
Lampros Smyrnaios 20b71164d5 - The worker will store the files in its local file-system and will send them to the controller in batches, after the latter requests them. When all files from a given assignments-num are sent, the files will be deleted from the Worker, in a scheduled-job.
- Implement the "getFullTexts"-endpoint, which returns the requested full-texts in a zip file.
- Implement the "getFullText"-endpoint, which returns the requested full-text.
- Implement the "getHandledAssignmentsCounts"-endpoint which returns the assignments-numbers, which were handled by that worker.
- Make sure each urlReport has the same "Date" for a given assignments-number. Also, make sure the "size" and "hash" have a "null" value, in case the full-text was not found.
- Check and log thread-pool shutdown errors.
- Add the stack-trace in the error-logs, instead of the Stderr.
- Update SpringBoot dependency.
- Change log levels.
- Code cleanup.
2021-11-26 17:04:31 +02:00