Lampros Smyrnaios
3eaeff468a
Set new version.
2024-10-25 19:04:21 +03:00
Lampros Smyrnaios
0b3ee3e16e
- Set next version.
...
- Update dependencies.
2024-10-24 20:22:40 +03:00
Lampros Smyrnaios
ea17ec917b
- Set new version.
...
- Update dependencies.
2024-06-11 11:59:46 +03:00
Lampros Smyrnaios
d630f16198
Improve the compression of fulltext files:
...
- Fix not using the big bufferSize it was supposed to use.
- Make sure the maximum compression-level is used. Before, the invalid value "bufferSize" was passed as the level, and it is unclear to which real-compression level it was changed to, inside the zstd-library (19 or 22 (only allowed though "ultra mode")), probably to the ultra-level though, as this "switch" seems to be required only through the cli.
- Exclude the possibly outdated "commons-compress" transitive dependency from the "publications_retriever" dependency.
2024-06-10 18:21:35 +03:00
Lampros Smyrnaios
4af74d4581
- Reduce the amount of "requiredFreeSpace" needed to be available in order to accept new assignments.
...
- Increase the time to wait before rechecking the available free space, in order to get new assignments, to 30 minutes.
- Update dependencies.
- Code polishing.
2024-05-28 23:10:52 +03:00
Lampros Smyrnaios
c242f65518
- Improve error-handling in "ConnWithController.postShutdownReportToController()".
...
- Update dependencies.
2024-05-22 16:14:45 +03:00
Lampros Smyrnaios
795d6e7c93
- Update README.
...
- Update dependencies.
2024-04-26 13:36:41 +03:00
Lampros Smyrnaios
24c4a75acf
- Use the "RollingFile" logs-appender by default.
...
- Set the next version.
2024-02-08 18:51:10 +02:00
Lampros Smyrnaios
50d756d582
- Automatically use the latest version of "publications_retriever" software from the Nexus maven-repository.
...
- Update Gradle.
- Update License.
- Configure the destination of the logs in the "application.properties" file.
2024-02-08 18:33:18 +02:00
Lampros Smyrnaios
3909104a1b
- Update a dependency.
...
- Set new version.
2024-01-15 13:54:12 +02:00
Lampros Smyrnaios
c4770ee716
Set new version.
2023-12-22 12:40:26 +02:00
Lampros Smyrnaios
066d6f665f
- Take into account the new "errorMsg" value returned by "LoaderAndChecker.getWasValidAndCouldRetry()".
...
- Update dependencies.
2023-12-18 15:17:51 +02:00
Lampros Smyrnaios
5a9e7228ae
- Set the upcoming version.
...
- Update dependencies.
2023-11-27 13:02:33 +02:00
Lampros Smyrnaios
10e39d79a4
- Improve a log-message.
...
- Update dependencies.
2023-10-20 17:35:39 +03:00
Lampros Smyrnaios
18cc9e0e68
- Improve error-handling in file-compression.
...
- Update dependencies.
2023-10-04 16:08:38 +03:00
Lampros Smyrnaios
b579296ada
- Code optimization and polishing.
...
- Update dependencies.
2023-08-28 16:11:26 +03:00
Lampros Smyrnaios
088cf73b30
- Update dependencies.
...
- Code optimization and polishing.
2023-07-27 17:46:17 +03:00
Lampros Smyrnaios
952bf7c035
- Update dependencies.
...
- Code polishing.
2023-07-06 13:22:09 +03:00
Lampros Smyrnaios
33df46f6f5
- Improve README.
...
- Update and cleanup dependencies.
- Code polishing.
2023-06-22 12:47:36 +03:00
Lampros Smyrnaios
84f29ea7e0
Update versioning.
2023-05-30 15:22:33 +03:00
Lampros Smyrnaios
2b69733912
- Increase the test-delays of the scheduled tasks.
...
- Update dependencies.
2023-05-29 12:45:43 +03:00
Lampros Smyrnaios
903032f454
- After a WorkerReport has been sent, ask for new assignments immediately. So, the Worker does not have to wait for hours for the Controller to check for duplicate files in the DB, retrieve and upload the full-texts and insert the records to the DB.
...
- Special care is taken to delete the delivered full-texts as soon as possible.
- Write the workerReport to a json-file, in case something goes wrong, and keep it until the Controller notifies the Worker that the processing was successful.
2023-05-23 22:19:41 +03:00
Lampros Smyrnaios
4d90846261
- In case the specified "controllerIP" is actually a domain-name, find its IP-address, so that a proper IP-to-IP comparison can be performed and the "securityChecks" can pass.
...
- Increase the "read-timeout" when searching for the host's machine public-IP.
- Update dependencies.
- Code polishing.
2023-05-22 21:25:22 +03:00
Lampros Smyrnaios
7b7dd59b57
- Increase the "max_heap_size".
...
- Update a dependency.
- Update README.md
2023-04-28 19:37:12 +03:00
Lampros Smyrnaios
0997558347
Update dependencies.
2023-04-20 15:39:15 +03:00
Lampros Smyrnaios
796e46bc99
Update dependencies.
2023-03-27 19:44:49 +03:00
Lampros Smyrnaios
ff4fd3d289
- Show the elapsed time for each assignments-request to be processed by the Worker.
...
- Update dependencies.
2023-03-02 17:34:44 +02:00
Lampros Smyrnaios
66d3f7bcb2
- Show a warning, in case the number of results is different from the number of the assignments (due to missing / double logging).
...
- Update Spring.
2023-02-24 23:27:02 +02:00
Lampros Smyrnaios
81b61b530f
Drastically improve performance by applying a pre-processing algorithm for the assignments-list to open some "space" between assignments which have the same domain, which in return, causes the threads to block less during execution.
...
(The threads block, due to the mandatory "politeness-delay" before reconnecting with the same domain, in order to avoid overloading the remote servers.)
2023-02-24 23:23:37 +02:00
Lampros Smyrnaios
0626e85894
Update dependencies.
2023-02-15 16:18:33 +02:00
Lampros Smyrnaios
24b52fba63
- Refactor the initialization and configuration process and Spring-ify the project.
...
- Update Spring dependency.
2023-01-25 18:33:49 +02:00
Lampros Smyrnaios
fd62ac567e
- Add a new endpoint "getFullTextsImproved" which uses Facebook's [**Zstandard**]( https://facebook.github.io/zstd/ ) compression algorithm, which brings very big benefits on compression rate and speed.
...
- Remove some dependencies.
2023-01-09 15:48:30 +02:00
Lampros Smyrnaios
778dc6e25c
- Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions.
...
- Show the number of assignments which are requested from the Controller, in the log-message.
- Update Spring.
2023-01-03 18:43:26 +02:00
Lampros Smyrnaios
8c1daadad0
- Increase the "requestReadTimeoutDuration" to 5 hours.
...
- Improve gradle's performance.
2022-12-12 17:49:14 +02:00
Lampros Smyrnaios
6c17e86c70
Code polishing.
2022-12-09 12:53:08 +02:00
Lampros Smyrnaios
182d6153d4
- Set some optimization settings for gradle.
...
- Fix error-handling in "installAndRun.sh".
- Update dependencies.
2022-11-30 16:25:57 +02:00
Lampros Smyrnaios
01f12e2fe2
- Align with "PublicationsRetriever's" updated "couldRetry" and "wasValid" logic.
...
- Update dependencies.
2022-11-11 16:02:20 +02:00
Lampros Smyrnaios
90a69686cf
- When the Worker is about to shut-down, after deleting all the handled assignments' files, check for remaining full-texts in the local storage and warn the user. If no remaining files were found, then delete the parent fulltexts' directory.
...
- Polish the code.
2022-11-02 02:27:04 +02:00
Lampros Smyrnaios
d91732bc16
- Add deletion, of the cookies in the newly-supported CookieManager, after each batch.
...
- Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated.
- Update dependencies.
- Code cleanup.
2022-06-27 17:58:02 +03:00
Lampros Smyrnaios
d6e94912a4
- Optimize zip-file creation.
...
- Update dependencies.
2022-05-26 15:24:36 +03:00
Lampros Smyrnaios
31af0a81eb
- Update the Worker's report to include the datasourceID for each record. It is used by the Controller inside the S3-fileNames.
...
- Update dependencies.
2022-04-01 19:42:32 +03:00
Lampros Smyrnaios
5fee05e994
Update dependencies.
2022-03-28 14:29:54 +03:00
Lampros Smyrnaios
8453c742f2
Update Spring dependencies.
2022-02-25 17:41:10 +02:00
Lampros Smyrnaios
a428b1d1e6
- Fix not prioritizing the gradle version defined inside the "installAndRun.sh" script.
...
- Update SpringBoot dependency.
2022-01-21 15:19:52 +02:00
Lampros Smyrnaios
1ddfd34236
- Allow the user to set a maximum number of assignments-batches for the Worker to handle. After handling those batches, the Worker will shut down. A number of < 0 > indicates an infinite number of batches.
...
- Avoid converting the zero fileSize to < null >. Now, the default value is < null >, so the zero-value will indicate a zero-byte file.
- Update dependencies.
- Code cleanup.
2021-12-24 00:12:34 +02:00
Lampros Smyrnaios
82d69f3bf5
- Calculate and set the max heap size with respect to the system resources, in "installAndRun.sh".
...
- Fix not setting the right "Error"-members when the docUrl was found, but the full-text was not retrieved.
- Set a "couldRetry"-indication in the "Error"-class, when the full-text was retrieved, as, in general, it could be retried to give the same successful result.
- Update the "docFileNotRetrieved"-check to use the standardized string.
- Eliminate some possible NPEs.
- Update Gradle.
2021-12-16 02:04:05 +02:00
Lampros Smyrnaios
fd5b56e3c6
- Allow the user to set the "maxAssignmentsLimitPerBatch" value.
...
- Set increased lower and upper limits for the Java Heap Size.
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
- Improve two log-messages.
2021-12-07 00:52:40 +02:00
Lampros Smyrnaios
6355b3e397
- Increase the "PublicationsRetriever.threadsMultiplier" to "6", as the threads are mostly network-blocked.
...
- Make sure the "maven" package is installed before compiling the "PublicationsRetriever" library.
- Update dependencies.
2021-11-30 01:02:06 +02:00
Lampros Smyrnaios
20b71164d5
- The worker will store the files in its local file-system and will send them to the controller in batches, after the latter requests them. When all files from a given assignments-num are sent, the files will be deleted from the Worker, in a scheduled-job.
...
- Implement the "getFullTexts"-endpoint, which returns the requested full-texts in a zip file.
- Implement the "getFullText"-endpoint, which returns the requested full-text.
- Implement the "getHandledAssignmentsCounts"-endpoint which returns the assignments-numbers, which were handled by that worker.
- Make sure each urlReport has the same "Date" for a given assignments-number. Also, make sure the "size" and "hash" have a "null" value, in case the full-text was not found.
- Check and log thread-pool shutdown errors.
- Add the stack-trace in the error-logs, instead of the Stderr.
- Update SpringBoot dependency.
- Change log levels.
- Code cleanup.
2021-11-26 17:04:31 +02:00
Lampros Smyrnaios
3220c97373
- Improve performance when requesting, processing and posting requests.
...
- Fix a bug, causing degraded performance when processing more than 3000 assignments.
- Fix the progress percentage shown in the logs.
- Avoid a potential NPE when processing a broken "Assignment" object.
- Update Spring to v.2.5.6.
- Code cleanup.
2021-10-30 17:14:18 +03:00