Lampros Smyrnaios
44459c8681
- Rename "ImpalaConnector.java" to "DatabaseConnector.java".
...
- Update dependencies.
- Code polishing.
2023-08-23 16:55:23 +03:00
Lampros Smyrnaios
8d8a387ff2
Reduce the waiting time for new background tasks to be scheduled for processing.
2023-07-24 20:33:56 +03:00
Lampros Smyrnaios
9cbac77c2a
- Add check for "shouldShutdownService" before allowing to continue with a bulk-import request.
...
- Add check for remaining background tasks (including bulkImports), before checking if the workers have shut down and then shut down the Service.
2023-07-21 16:19:00 +03:00
Lampros Smyrnaios
b94c35c66e
- Fix double active "@Scheduled" annotation for the "ScheduledTasks.updatePrometheusMetrics()" method.
...
- Code polishing.
2023-07-13 18:32:45 +03:00
Lampros Smyrnaios
e8644cb64f
- Optimize the "insertAssignmentsQuery".
...
- Add documentation about the Prometheus Metrics, in README.
- Update Dependencies.
- Code polishing.
2023-07-05 17:10:30 +03:00
Lampros Smyrnaios
0f4b63c4a9
Expose the following statistics as prometheus-metrics and create/update a stats-endpoint for each one:
...
- "numOfPayloadsAggregatedByServiceThroughCrawling"
- "numOfPayloadsAggregatedByServiceThroughBulkImport"
- "numOfPayloadsAggregatedByService"
- "numOfLegacyPayloads"
- "numOfRecordsInspectedByServiceThroughCrawling" (renamed from "numOfInspectedRecords")
2023-06-23 15:22:26 +03:00
Lampros Smyrnaios
b9712bed85
- Expose the "numOfAllPayloads" and "numOfInspectedRecords" DB-stats to Prometheus, by using a scheduling task to request the numbers from the DB, every 6 hours.
...
- Update the "StatsServiceImpl.getNumberOfPayloadsAggregatedByService()" to use the new table "payload_aggregated", instead of casting and checking the date of the records.
- Code polishing.
2023-06-19 14:42:00 +03:00
Lampros Smyrnaios
798fa09d68
- Identify and handle a possible Worker-crash, in "UrlsServiceImpl.postReportResultToWorker()".
...
- Add/Improve some log messages.
- Update and cleanup dependencies.
- Code polishing.
2023-06-15 23:19:36 +03:00
Lampros Smyrnaios
6669dc61bf
- Increase the initialDelay for the "checkIfServiceIsReadyForShutdown" scheduled-task, in production, to 10 minutes.
...
- Code polishing.
2023-06-06 16:49:53 +03:00
Lampros Smyrnaios
a38d6ace79
Code polishing.
2023-05-29 12:21:48 +03:00
Lampros Smyrnaios
74ff31fc64
- Show the workerIPs in the logs.
...
- Rename the "FullTexts"-files to "BulkImport".
2023-05-29 12:12:08 +03:00
Lampros Smyrnaios
3988eb3a48
- Use a separate HDFS sub-dir for every assignments-batch, in order to avoid any disrruptancies from multiple threads moving parquet-files from the same sub-dir. Multiple batches from the same worker may be processed at the same time. These sub-dirs are deleted afterwards.
...
- Treat the "contains no visible files" situation as an error. In which case the assignments-data is presumed to not have been inserted to the database tables.
- Code polishing/cleanup.
2023-05-27 02:36:05 +03:00
Lampros Smyrnaios
02cee097d4
Fix an issue, which could cause some background jobs to be executed more than 1 times. The previously executed jobs were not deleted from the global list fast enough, and they would be selected again, in case they were not finished before the scheduler started again.
2023-05-26 13:08:00 +03:00
Lampros Smyrnaios
2b50e08bf6
- Handle the case, were multiple threads may load the same HDFS directory to a database table, thus causing the "directory contains no visible files"-SQLException.
...
- Improve the values of the delays for some scheduledTasks.
- Improve elapsed time precision for the "lastAccessedOn" metadata of the workerReports.
- Code polishing.
2023-05-25 00:34:36 +03:00
Lampros Smyrnaios
164245cb53
- Automatically delete the unsuccessful WorkerReports, which are more than 7 days old.
...
- Optimize the Service's startup speed, by setting "initialDelays" to the scheduled tasks.
- Optimize documentation.
2023-05-24 16:59:42 +03:00
Lampros Smyrnaios
0ea3e2de24
Add the "shutdownService" and "cancelShutdownService" endpoints. The Controller sends the related requests to the Workers and shutdowns gracefully, after all workers have shutdown.
2023-05-24 13:42:29 +03:00
Lampros Smyrnaios
b6e8cd1889
New feature: BulkImport full-text files from compatible datasources.
2023-05-11 03:07:55 +03:00
Lampros Smyrnaios
4280f89296
- Set the default value of the "isTestEnvironment" property to "true", in order to avoid undesired outcomes in the production db.
...
- Code polishing.
2023-03-21 17:04:28 +02:00
Lampros Smyrnaios
33ba3e8d91
- Avoid getting and uploading (to S3), full-texts which are already uploaded by previous assignments-batches.
...
- Fix not updating the fileLocation with the s3Url for records which share the same full-text.
- Set only one delete-order for each assignments-batch-files, not one (or more, by mistake) per zip-batch.
- Set the HttpStatus to "204 - NO_CONTENT", when no assignments are available to be returned to the Worker.
- Fix not unlocking the "dataBaseLock" in case of a "dataBase-connection"-error, in "addWorkerReport()".
- Improve some log-messages.
- Change the log-level for the "S3-bucket already exists" message.
- Update Gradle.
- Optimize imports.
- Code cleanup.
2021-12-21 15:55:27 +02:00
Lampros Smyrnaios
d100af35d0
- Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling.
...
- Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests.
- Optimize the "getTestUrls"-endpoint.
- Disable the "reportCurrentTime()" scheduled-task.
- Update dependencies and bump project's version to '1.0.0-SNAPSHOT'.
- Set the logging-appender to "File".
- Code cleanup.
2021-11-09 23:59:27 +02:00
Lampros Smyrnaios
8a4376da9c
Initial commit of UrlsController.
2021-03-16 15:25:15 +02:00