Commit Graph

17 Commits

Author SHA1 Message Date
Lampros Smyrnaios e8644cb64f - Optimize the "insertAssignmentsQuery".
- Add documentation about the Prometheus Metrics, in README.
- Update Dependencies.
- Code polishing.
2023-07-05 17:10:30 +03:00
Lampros Smyrnaios 0f4b63c4a9 Expose the following statistics as prometheus-metrics and create/update a stats-endpoint for each one:
- "numOfPayloadsAggregatedByServiceThroughCrawling"
 - "numOfPayloadsAggregatedByServiceThroughBulkImport"
 - "numOfPayloadsAggregatedByService"
 - "numOfLegacyPayloads"
 - "numOfRecordsInspectedByServiceThroughCrawling" (renamed from "numOfInspectedRecords")
2023-06-23 15:22:26 +03:00
Lampros Smyrnaios b9712bed85 - Expose the "numOfAllPayloads" and "numOfInspectedRecords" DB-stats to Prometheus, by using a scheduling task to request the numbers from the DB, every 6 hours.
- Update the "StatsServiceImpl.getNumberOfPayloadsAggregatedByService()" to use the new table "payload_aggregated", instead of casting and checking the date of the records.
- Code polishing.
2023-06-19 14:42:00 +03:00
Lampros Smyrnaios 798fa09d68 - Identify and handle a possible Worker-crash, in "UrlsServiceImpl.postReportResultToWorker()".
- Add/Improve some log messages.
- Update and cleanup dependencies.
- Code polishing.
2023-06-15 23:19:36 +03:00
Lampros Smyrnaios 6669dc61bf - Increase the initialDelay for the "checkIfServiceIsReadyForShutdown" scheduled-task, in production, to 10 minutes.
- Code polishing.
2023-06-06 16:49:53 +03:00
Lampros Smyrnaios a38d6ace79 Code polishing. 2023-05-29 12:21:48 +03:00
Lampros Smyrnaios 74ff31fc64 - Show the workerIPs in the logs.
- Rename the "FullTexts"-files to "BulkImport".
2023-05-29 12:12:08 +03:00
Lampros Smyrnaios 3988eb3a48 - Use a separate HDFS sub-dir for every assignments-batch, in order to avoid any disrruptancies from multiple threads moving parquet-files from the same sub-dir. Multiple batches from the same worker may be processed at the same time. These sub-dirs are deleted afterwards.
- Treat the "contains no visible files" situation as an error. In which case the assignments-data is presumed to not have been inserted to the database tables.
- Code polishing/cleanup.
2023-05-27 02:36:05 +03:00
Lampros Smyrnaios 02cee097d4 Fix an issue, which could cause some background jobs to be executed more than 1 times. The previously executed jobs were not deleted from the global list fast enough, and they would be selected again, in case they were not finished before the scheduler started again. 2023-05-26 13:08:00 +03:00
Lampros Smyrnaios 2b50e08bf6 - Handle the case, were multiple threads may load the same HDFS directory to a database table, thus causing the "directory contains no visible files"-SQLException.
- Improve the values of the delays for some scheduledTasks.
- Improve elapsed time precision for the "lastAccessedOn" metadata of the workerReports.
- Code polishing.
2023-05-25 00:34:36 +03:00
Lampros Smyrnaios 164245cb53 - Automatically delete the unsuccessful WorkerReports, which are more than 7 days old.
- Optimize the Service's startup speed, by setting "initialDelays" to the scheduled tasks.
- Optimize documentation.
2023-05-24 16:59:42 +03:00
Lampros Smyrnaios 0ea3e2de24 Add the "shutdownService" and "cancelShutdownService" endpoints. The Controller sends the related requests to the Workers and shutdowns gracefully, after all workers have shutdown. 2023-05-24 13:42:29 +03:00
Lampros Smyrnaios b6e8cd1889 New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Lampros Smyrnaios 4280f89296 - Set the default value of the "isTestEnvironment" property to "true", in order to avoid undesired outcomes in the production db.
- Code polishing.
2023-03-21 17:04:28 +02:00
Lampros Smyrnaios 33ba3e8d91 - Avoid getting and uploading (to S3), full-texts which are already uploaded by previous assignments-batches.
- Fix not updating the fileLocation with the s3Url for records which share the same full-text.
- Set only one delete-order for each assignments-batch-files, not one (or more, by mistake) per zip-batch.
- Set the HttpStatus to "204 - NO_CONTENT", when no assignments are available to be returned to the Worker.
- Fix not unlocking the "dataBaseLock" in case of a "dataBase-connection"-error, in "addWorkerReport()".
- Improve some log-messages.
- Change the log-level for the "S3-bucket already exists" message.
- Update Gradle.
- Optimize imports.
- Code cleanup.
2021-12-21 15:55:27 +02:00
Lampros Smyrnaios d100af35d0 - Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling.
- Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests.
- Optimize the "getTestUrls"-endpoint.
- Disable the "reportCurrentTime()" scheduled-task.
- Update dependencies and bump project's version to '1.0.0-SNAPSHOT'.
- Set the logging-appender to "File".
- Code cleanup.
2021-11-09 23:59:27 +02:00
Lampros Smyrnaios 8a4376da9c Initial commit of UrlsController. 2021-03-16 15:25:15 +02:00