Commit Graph

23 Commits

Author SHA1 Message Date
Lampros Smyrnaios 64a1b7d4f0 - Comment-out the bucket-deletion process, in order to avoid any accidental deletion, even if it has to be explicitly allowed in the config.
- Update dependencies.
2024-04-26 12:54:00 +03:00
Lampros Smyrnaios f61cae41a1 - Try to get the cause of the exception of the callable-tasks which handle the bulk-import of fileSegments.
- Fix not counting the failedSegments when an exception was thrown.
- Code polishing.
2024-03-13 12:15:59 +02:00
Lampros Smyrnaios b72996c9a9 - Configure the destination of the logs in the "application.properties" file.
- Add some gradle files to be used by Jenkins.
2024-02-08 19:47:34 +02:00
Lampros Smyrnaios 34d7a143e7 Add/improve documentation. 2024-02-01 14:37:29 +02:00
Lampros Smyrnaios 2e60128084 - Allow to easily change the por used by workers.
- Show the number of active background-tasks and bulkImportDirs, which delay the Service's shutdown.
- Update dependencies.
- Code polishing.
2023-12-19 23:31:42 +02:00
Lampros Smyrnaios fb2877dbe8 Upgrade the execution system for the backgroundTasks:
- Submit each task immediately for execution, instead of waiting for a scheduling thread to send all gathered tasks (up to that point) to the ExecutorService (and block until they are finished, before it can start again).
- Hold the Future of each submitted task to a synchronized-list to check the result of each task at a scheduled time.
- Reduce the cpu-time to assure the Service can shut down, by checking if there are "actively" and "about-to-be-executed" tasks, at the same time. Instead of having to rely on the additional checking of the "shutdown"-status of each worker to verify that no active task exist.
- Improve the threads' shutdown procedure.
2023-10-09 17:23:59 +03:00
Lampros Smyrnaios ede7ca5a89 - Add bulk-import support for non-Authoritative data-sources.
- Update Spring Boot.
- Code polishing.
2023-09-26 18:02:48 +03:00
Lampros Smyrnaios b3e0d214fd Update the BulkImport API:
- Refactor the "bulkImportReportID".
- Add the "bulk:" prefix in the provenance value, in the DB.
- Fix not using correctly the "Lists.partition()" method.
- Make sure the "bulkImportDir" is removed from the "bulkImportDirsUnderProcessing" Set, in case of an early-error.
- Fix the "numFailedSegments"-calculation.
- Improve some messages.
- Code polishing.
2023-08-21 18:19:53 +03:00
Lampros Smyrnaios 66a5b3c7da Update Bulk-Import API:
- Increase the "numOfThreadsPerBulkImportProcedure" to 6.
- Fix Bulk import not working from a second-level subdirectory; the report-subDirectory was not created.
- Fix not returning the bulk-import-report as "application/json".
- Add useful messages for missing parameters.
- Change the HTTP-method for the "bulkImportFullTexts" endpoint to "POST".
- Show a structured json-response for the "bulkImportFullTexts" endpoint.
- Fix uncommon date-format.
- Remove single quotes from json-report, since they are returned as bytes, not characters.
- Optimize the generation of the json-bulkImport-report.
2023-07-25 11:59:47 +03:00
Lampros Smyrnaios cec2531737 - Increase the "numOfBackgroundThreads" to 8.
- Make the "numOfBackgroundThreads" and "numOfThreadsPerBulkImportProcedure" configurable from the "application.yml" file.
- Code polishing.
2023-07-21 11:45:50 +03:00
Lampros Smyrnaios 551c4acef5 Fix property naming missmatch. 2023-05-24 14:49:29 +03:00
Lampros Smyrnaios 8b5f143b0a Place the "workerReports" and the "bulkImportReports" dirs inside the "reports" parent-directory. 2023-05-24 14:10:57 +03:00
Lampros Smyrnaios cd1fb0af88 - Process the WorkerReports in background Jobs and post the reportResults to the Workers.
- Save the workerReports to json files, until they are processed successfully.
- Show some custom metrics in prometheus.
2023-05-24 13:52:28 +03:00
Lampros Smyrnaios 0ea3e2de24 Add the "shutdownService" and "cancelShutdownService" endpoints. The Controller sends the related requests to the Workers and shutdowns gracefully, after all workers have shutdown. 2023-05-24 13:42:29 +03:00
Lampros Smyrnaios c2a1b96069 - Rename the mounted "mnt/bulkImport/" directory to "/mnt/bulk_import/".
- Increase the "awaitTermination" timeout for the ExecutorService to 2 minutes.
2023-05-23 21:09:34 +03:00
Lampros Smyrnaios b6e8cd1889 New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Lampros Smyrnaios 55ea5118ac - Update the "testDatabaseName" property.
- Code polishing.
2023-04-26 19:33:28 +03:00
Lampros Smyrnaios c39fef2654 Upgrade payload-table to payload-view which consists of three separate payload tables: "payload_legacy", "payload_aggregated" and "payload_bulk_import". 2023-04-10 15:55:50 +03:00
Lampros Smyrnaios 882c6f447b Update the "testDatabaseName". 2023-03-21 23:10:21 +02:00
Lampros Smyrnaios 4280f89296 - Set the default value of the "isTestEnvironment" property to "true", in order to avoid undesired outcomes in the production db.
- Code polishing.
2023-03-21 17:04:28 +02:00
Lampros Smyrnaios e975bec911 - Add Prometheus and Grafana which help measuring various metrics for the Controller's health and performance.
- Fix Docker config still using the old (now removed) "application.properties" file.
- Simplify the process of building and running the docker image; Now we use docker compose to run the Controller, along with the Prometheus and Grafana. Also, now it is not requested from the user to login and push the image (this may change in the future).
2023-03-21 16:46:33 +02:00
Lampros Smyrnaios 003c0bf179 - Add support for excluding specific datasources from being crawled. These datasources may be aggregated through bulk-imports, by other pieces of software. Such a datasource is "arXiv.org".
- Fix an issue, where the "datasource-type" was retrieved instead of the "datasource-name".
- Polish the "findAssignmentsQuery".
2023-03-21 07:19:35 +02:00
Lampros Smyrnaios f835a752bf Transform the "application.properties" file to "application.yml" and optimize the property-trees. 2023-03-20 15:23:00 +02:00