Commit Graph

32 Commits

Author SHA1 Message Date
Lampros Smyrnaios 6226e2298d - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement.
One side effect of using the parquet-files, is that the timestamps are now BIGDECIMAL numbers, instead of "Timestamp" objects, but, converting them to such objects is pretty easy, if we ever need to do it.
- Code polishing.
2022-11-10 17:18:21 +02:00
Lampros Smyrnaios 6a03103b79 Update dependencies. 2022-11-10 16:50:21 +02:00
Lampros Smyrnaios a2cd02115f - Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated.
- Update dependencies.
2022-06-27 21:41:32 +03:00
Lampros Smyrnaios e3b374a32f - Optimize file-related tasks.
- Update dependencies.
- Code cleanup.
2022-05-26 15:43:59 +03:00
Lampros Smyrnaios 9b95eebb6c - Remove the obsolete "parenthesis" and "increasing duplicate-num" from the full-texts' names, before sending them to the S3-Object-Store. They now end with the "file-hash", so it is guaranteed that they will be unique. The Worker continues to produce the previous kind of names, without any disturbance.
- Improve logging.
- Update MinIO dependency.
2022-04-11 21:15:22 +03:00
Lampros Smyrnaios 33fc61a8d9 - Fix the fileName-ID not being directly related with the datasourceID, in the S3-ObjectStore name. Add explanatory comments.
- Add missing error-logs.
2022-04-05 16:22:02 +03:00
Lampros Smyrnaios 5e4fad2479 - Change the fileNames' structure in the S3-ObjectStore.
- Update dependencies.
2022-04-01 19:24:04 +03:00
Lampros Smyrnaios 48670f3399 - Show the percentage of the "NumFullTextsFound", in the logs.
- Update dependencies.
2022-03-28 14:29:31 +03:00
Lampros Smyrnaios e587b2ca6c Update Spring dependencies. 2022-02-25 17:41:24 +02:00
Lampros Smyrnaios 71f6b46130 - In case of an error when creating the "current_assignment" table (e.g out of memory in the backend database server), check for partial-creation and drop it. Also, in any case, before we drop this table, now check if it exists firsts (in general it should always exist, unless the creation results in an error and the table was not created at all).
- Fix an error-message.
- Update dependencies.
- Code cleanup.
2022-02-14 12:36:00 +02:00
Antonis Lempesis 6dde8c0faa finished merge 2022-01-31 04:17:16 +02:00
Antonis Lempesis e47fd8d97b merged refactor branch 2022-01-30 23:10:06 +02:00
Antonis Lempesis bf26bf955f springified project 2022-01-30 22:14:52 +02:00
Lampros Smyrnaios 92b11baf93 - Update the repository for the Impala JDBC Driver.
- Code cleanup.
2022-01-28 00:59:19 +02:00
Antonis Lempesis 91f460ce51 moved impala jar to omtd repository and updated build file 2022-01-28 00:41:29 +02:00
Lampros Smyrnaios 3c9f8870d1 - Change the repository for the Impala JDBC Driver, as the previous one had networking issues.
- Optimize the "findAssignmentsQuery".
2022-01-26 19:52:46 +02:00
Lampros Smyrnaios 8d9336fa52 Update dependencies. 2022-01-21 15:04:29 +02:00
Lampros Smyrnaios 82bf11b9b3 - Workaround a bug of Impala-JDBC-Driver, when creating insert-prepared-statements.
- Update dependencies.
2021-12-24 00:25:50 +02:00
Lampros Smyrnaios a46ab84f10 - Increase the lower and upper limits for the Java Heap Size.
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
2021-12-06 20:27:39 +02:00
Lampros Smyrnaios dea257b87f - Fix a bug, which caused the get-full-texts request to fail, because of the wrong "requestAssignmentsCounter".
- Fix a bug, which caused multiple workers to get assigned the same batch-counter, while the assignment-tasks where different.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the error-message returned from the Worker, when a getFullTexts-request fails.
- Improve some log-messages.
- Update dependencies.
- Code cleanup.
2021-12-06 20:18:30 +02:00
Lampros Smyrnaios 48eed20dd8 - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files.
- Move the "mergeParquetFiles" and "getCutBatchExceptionMessage" methods inside the "FileUtils" class.
- Code cleanup.
2021-11-30 18:23:27 +02:00
Lampros Smyrnaios 780ed15ce2 - Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction.
- Fix the "null" representation of an "unknown" payload-size in the database.
- Remove the obsolete thread-locking for the "CreateDatabase" operation. This code is guaranteed to run BEFORE any other operation in the database.
- Implement the "handlePreparedStatementException" and "closeConnection" methods.
- Improve error-logs.
- Update dependencies.
- Code cleanup.
2021-11-30 13:26:19 +02:00
Lampros Smyrnaios d100af35d0 - Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling.
- Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests.
- Optimize the "getTestUrls"-endpoint.
- Disable the "reportCurrentTime()" scheduled-task.
- Update dependencies and bump project's version to '1.0.0-SNAPSHOT'.
- Set the logging-appender to "File".
- Code cleanup.
2021-11-09 23:59:27 +02:00
Lampros Smyrnaios 0d47c33a08 - Improve logging configurations.
- Fix the "AssignmentResponse.toString()" method.
- Update SpringBoot to v.2.5.6
- Code cleanup.
2021-11-04 11:57:19 +02:00
Lampros Smyrnaios 983b900da7 - Add the "installAndRun.sh" script.
- Update the README.
- Update the dependencies.
2021-09-09 15:56:37 +03:00
Lampros Smyrnaios 25c566bf68 - Change server's port.
- Update dependencies.
2021-07-29 08:44:36 +03:00
Lampros Smyrnaios 308cab5ecd - Return an HTTP-500-error when the server cannot find the resourceFile requested by the "getTestUrls"-endpoint.
- Close the "inputScanner" after each use when retrieving the test-tasks.
- Show info-logs when sending an assignment to a worker.
- Code cleanup.
2021-06-10 14:21:39 +03:00
Lampros Smyrnaios 87044574b5 Update dependencies and add the "gradle-wrapper.properties" file which defines the gradle version. 2021-06-08 19:12:40 +03:00
Lampros Smyrnaios d20fcf9cce - Update the "getUrls" and "getTestUrls" endpoints to take the data from the workers, in http-request-parameters instead of the http-body.
- Fix tasksLimit-check.
- Code cleanup.
2021-05-19 02:32:46 +03:00
Lampros Smyrnaios e2cc320baf - Add the "getTestUrls"-endpoint which returns an "Assignment" with data retrieved from the added resource-file.
- Update the "getUrls"-endpoint to be ready to retrieve data from the database, once it's added.
- Update the dependencies.
- Code cleanup.
2021-05-18 17:23:20 +03:00
Lampros Smyrnaios c6e12d3e95 - Update "addResults"-endpoint.
- Add "UrlsRequest.java".
- Some minor updates in "build.gradle" and "application.properties".
2021-03-16 18:07:30 +02:00
Lampros Smyrnaios 8a4376da9c Initial commit of UrlsController. 2021-03-16 15:25:15 +02:00