Lampros Smyrnaios
6226e2298d
- Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement.
...
One side effect of using the parquet-files, is that the timestamps are now BIGDECIMAL numbers, instead of "Timestamp" objects, but, converting them to such objects is pretty easy, if we ever need to do it.
- Code polishing.
2022-11-10 17:18:21 +02:00
Lampros Smyrnaios
6a03103b79
Update dependencies.
2022-11-10 16:50:21 +02:00
Lampros Smyrnaios
a2cd02115f
- Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated.
...
- Update dependencies.
2022-06-27 21:41:32 +03:00
Lampros Smyrnaios
e3b374a32f
- Optimize file-related tasks.
...
- Update dependencies.
- Code cleanup.
2022-05-26 15:43:59 +03:00
Lampros Smyrnaios
9b95eebb6c
- Remove the obsolete "parenthesis" and "increasing duplicate-num" from the full-texts' names, before sending them to the S3-Object-Store. They now end with the "file-hash", so it is guaranteed that they will be unique. The Worker continues to produce the previous kind of names, without any disturbance.
...
- Improve logging.
- Update MinIO dependency.
2022-04-11 21:15:22 +03:00
Lampros Smyrnaios
33fc61a8d9
- Fix the fileName-ID not being directly related with the datasourceID, in the S3-ObjectStore name. Add explanatory comments.
...
- Add missing error-logs.
2022-04-05 16:22:02 +03:00
Lampros Smyrnaios
5e4fad2479
- Change the fileNames' structure in the S3-ObjectStore.
...
- Update dependencies.
2022-04-01 19:24:04 +03:00
Lampros Smyrnaios
48670f3399
- Show the percentage of the "NumFullTextsFound", in the logs.
...
- Update dependencies.
2022-03-28 14:29:31 +03:00
Lampros Smyrnaios
e587b2ca6c
Update Spring dependencies.
2022-02-25 17:41:24 +02:00
Lampros Smyrnaios
71f6b46130
- In case of an error when creating the "current_assignment" table (e.g out of memory in the backend database server), check for partial-creation and drop it. Also, in any case, before we drop this table, now check if it exists firsts (in general it should always exist, unless the creation results in an error and the table was not created at all).
...
- Fix an error-message.
- Update dependencies.
- Code cleanup.
2022-02-14 12:36:00 +02:00
Antonis Lempesis
6dde8c0faa
finished merge
2022-01-31 04:17:16 +02:00
Antonis Lempesis
e47fd8d97b
merged refactor branch
2022-01-30 23:10:06 +02:00
Antonis Lempesis
bf26bf955f
springified project
2022-01-30 22:14:52 +02:00
Lampros Smyrnaios
92b11baf93
- Update the repository for the Impala JDBC Driver.
...
- Code cleanup.
2022-01-28 00:59:19 +02:00
Antonis Lempesis
91f460ce51
moved impala jar to omtd repository and updated build file
2022-01-28 00:41:29 +02:00
Lampros Smyrnaios
3c9f8870d1
- Change the repository for the Impala JDBC Driver, as the previous one had networking issues.
...
- Optimize the "findAssignmentsQuery".
2022-01-26 19:52:46 +02:00
Lampros Smyrnaios
8d9336fa52
Update dependencies.
2022-01-21 15:04:29 +02:00
Lampros Smyrnaios
82bf11b9b3
- Workaround a bug of Impala-JDBC-Driver, when creating insert-prepared-statements.
...
- Update dependencies.
2021-12-24 00:25:50 +02:00
Lampros Smyrnaios
a46ab84f10
- Increase the lower and upper limits for the Java Heap Size.
...
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
2021-12-06 20:27:39 +02:00
Lampros Smyrnaios
dea257b87f
- Fix a bug, which caused the get-full-texts request to fail, because of the wrong "requestAssignmentsCounter".
...
- Fix a bug, which caused multiple workers to get assigned the same batch-counter, while the assignment-tasks where different.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the error-message returned from the Worker, when a getFullTexts-request fails.
- Improve some log-messages.
- Update dependencies.
- Code cleanup.
2021-12-06 20:18:30 +02:00
Lampros Smyrnaios
48eed20dd8
- Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files.
...
- Move the "mergeParquetFiles" and "getCutBatchExceptionMessage" methods inside the "FileUtils" class.
- Code cleanup.
2021-11-30 18:23:27 +02:00
Lampros Smyrnaios
780ed15ce2
- Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction.
...
- Fix the "null" representation of an "unknown" payload-size in the database.
- Remove the obsolete thread-locking for the "CreateDatabase" operation. This code is guaranteed to run BEFORE any other operation in the database.
- Implement the "handlePreparedStatementException" and "closeConnection" methods.
- Improve error-logs.
- Update dependencies.
- Code cleanup.
2021-11-30 13:26:19 +02:00
Lampros Smyrnaios
d100af35d0
- Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling.
...
- Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests.
- Optimize the "getTestUrls"-endpoint.
- Disable the "reportCurrentTime()" scheduled-task.
- Update dependencies and bump project's version to '1.0.0-SNAPSHOT'.
- Set the logging-appender to "File".
- Code cleanup.
2021-11-09 23:59:27 +02:00
Lampros Smyrnaios
0d47c33a08
- Improve logging configurations.
...
- Fix the "AssignmentResponse.toString()" method.
- Update SpringBoot to v.2.5.6
- Code cleanup.
2021-11-04 11:57:19 +02:00
Lampros Smyrnaios
983b900da7
- Add the "installAndRun.sh" script.
...
- Update the README.
- Update the dependencies.
2021-09-09 15:56:37 +03:00
Lampros Smyrnaios
25c566bf68
- Change server's port.
...
- Update dependencies.
2021-07-29 08:44:36 +03:00
Lampros Smyrnaios
308cab5ecd
- Return an HTTP-500-error when the server cannot find the resourceFile requested by the "getTestUrls"-endpoint.
...
- Close the "inputScanner" after each use when retrieving the test-tasks.
- Show info-logs when sending an assignment to a worker.
- Code cleanup.
2021-06-10 14:21:39 +03:00
Lampros Smyrnaios
87044574b5
Update dependencies and add the "gradle-wrapper.properties" file which defines the gradle version.
2021-06-08 19:12:40 +03:00
Lampros Smyrnaios
d20fcf9cce
- Update the "getUrls" and "getTestUrls" endpoints to take the data from the workers, in http-request-parameters instead of the http-body.
...
- Fix tasksLimit-check.
- Code cleanup.
2021-05-19 02:32:46 +03:00
Lampros Smyrnaios
e2cc320baf
- Add the "getTestUrls"-endpoint which returns an "Assignment" with data retrieved from the added resource-file.
...
- Update the "getUrls"-endpoint to be ready to retrieve data from the database, once it's added.
- Update the dependencies.
- Code cleanup.
2021-05-18 17:23:20 +03:00
Lampros Smyrnaios
c6e12d3e95
- Update "addResults"-endpoint.
...
- Add "UrlsRequest.java".
- Some minor updates in "build.gradle" and "application.properties".
2021-03-16 18:07:30 +02:00
Lampros Smyrnaios
8a4376da9c
Initial commit of UrlsController.
2021-03-16 15:25:15 +02:00