Commit Graph

28 Commits

Author SHA1 Message Date
Lampros Smyrnaios 9b95eebb6c - Remove the obsolete "parenthesis" and "increasing duplicate-num" from the full-texts' names, before sending them to the S3-Object-Store. They now end with the "file-hash", so it is guaranteed that they will be unique. The Worker continues to produce the previous kind of names, without any disturbance.
- Improve logging.
- Update MinIO dependency.
2022-04-11 21:15:22 +03:00
Lampros Smyrnaios a81ed3c60f - Add an "isTestEnvironment"-switch, which makes it easier to work with production and test databases.
- In case the Worker cannot be reached during a full-texts' batch request, abort the rest of the batches.
- Fix memory leaks when unzipping the batch-zip-file.
- Add explanatory comments for picking the database related to a full-text file.
2022-04-08 17:39:45 +03:00
Lampros Smyrnaios 33fc61a8d9 - Fix the fileName-ID not being directly related with the datasourceID, in the S3-ObjectStore name. Add explanatory comments.
- Add missing error-logs.
2022-04-05 16:22:02 +03:00
Lampros Smyrnaios a23c918a42 - Fix a "@JsonProperty" annotation inside "Payload.java".
- Fix a "@Value" annotation inside "FileUtils.java".
- Add a new database and show its name along with the initial's name in the logs.
- Code cleanup and improvement.
2022-04-05 00:01:44 +03:00
Lampros Smyrnaios 5e4fad2479 - Change the fileNames' structure in the S3-ObjectStore.
- Update dependencies.
2022-04-01 19:24:04 +03:00
Lampros Smyrnaios 48670f3399 - Show the percentage of the "NumFullTextsFound", in the logs.
- Update dependencies.
2022-03-28 14:29:31 +03:00
Lampros Smyrnaios 88acaae20f - Replace the "numFullTextUrlsFound"-counter with "numFullTextsFound"-counter to reflect the end result of the actually available full-texts (which were downloaded by the Worker).
- Optimize the gather-fileNames loop.
- Improve a message in "installAndRun.sh"
2022-02-23 17:40:06 +02:00
Lampros Smyrnaios ad5dbdde9b - Improve performance when inserting records into the "attempt" table, by splitting the records equally, across more threads.
- Bring back the "UriBuilder", which informs us in the logs, about the Controller's url (IP, PORT, API).
- Code cleanup.
2022-02-22 13:54:16 +02:00
Lampros Smyrnaios d2ed9cd9ed Improve efficiency and performance when processing the full-texts. 2022-02-08 15:02:13 +02:00
Lampros Smyrnaios 1111c850b9 - Add support for more than one full-text per id. Allow recognizing fileName additions: "id(1).pdf", "id(2).pdf", etc.
- Fix not giving the databaseName in the "ImpalaController.get10PublicationIdsTest()".
- Improve consistency in the "maxAttemptsPerRecord" value, among different threads. Also, reduce the value-increase by one.
- Check if the tableName string is empty, in the "mergeParquetFiles".
- Improve error-logging.
- Set some local variables to "final", optimizing code-execution by the JVM.
2022-02-07 13:57:09 +02:00
Lampros Smyrnaios 6aab1d242b - Improve performance when handling WorkerReports' database insertions, by using parallelism to insert to two different tables in the same time. Also, pre-cache the query-argument-types.
- Update the error-message and counting system, on partial insertion event.
2022-02-04 14:48:22 +02:00
Lampros Smyrnaios be4898e43e Bug fixes and improvements:
- Fix an NPE, when the "getTestUrls"-endpoint is called. It was thrown because of an absent per-thread initialization of some thread-local variables.
- Fix JdbcTemplate error when querying the "getFileLocationForHashQuery".
- Fix the "S3ObjectStore.isLocationInStore" check.
- Fix not catching/handling some exceptions.
- Fix/improve log-messages.
- Optimize the "getFileLocationForHashQuery" to return only the first row. In the latest change, without this optimization, the query-result would cause non-handling the same-hash cases, because of an exception.
- Optimize the "ImpalaConnector.databaseLock.lock()" positioning.
- Update the "getTestUrls" api-path.
- Optimize list-allocation.
- Re-add the info-message about the successful emptying of the S3-bucket.
- Code cleanup.
2022-02-02 20:19:46 +02:00
Antonis Lempesis e9bede5c45 more fixes 2022-02-01 02:08:02 +02:00
Antonis Lempesis 6dde8c0faa finished merge 2022-01-31 04:17:16 +02:00
Antonis Lempesis bf26bf955f springified project 2022-01-30 22:14:52 +02:00
Lampros Smyrnaios d0ab42e4fa - Change the scheme of the file-location URI.
- Move the old and the current database names in the "application.properties" file.
- Improve logging.
2022-01-28 07:24:42 +02:00
Lampros Smyrnaios ab99bc6168 - Make sure the temp table "current_assignment" from a cancelled previous execution, is dropped and purged on startup.
- Improve logging.
- Code cleanup.
2022-01-19 01:37:47 +02:00
Lampros Smyrnaios 2cf25b0d26 - Fix Impala "broken pipe" error, by closing the connection when not in need. The connection is reopened later with minimal overhead, as a connection pool is used.
- Fix not closing the database-connection in case of a specific error (also in a commented error-case).
2022-01-13 00:47:15 +02:00
Lampros Smyrnaios 82bf11b9b3 - Workaround a bug of Impala-JDBC-Driver, when creating insert-prepared-statements.
- Update dependencies.
2021-12-24 00:25:50 +02:00
Lampros Smyrnaios 33ba3e8d91 - Avoid getting and uploading (to S3), full-texts which are already uploaded by previous assignments-batches.
- Fix not updating the fileLocation with the s3Url for records which share the same full-text.
- Set only one delete-order for each assignments-batch-files, not one (or more, by mistake) per zip-batch.
- Set the HttpStatus to "204 - NO_CONTENT", when no assignments are available to be returned to the Worker.
- Fix not unlocking the "dataBaseLock" in case of a "dataBase-connection"-error, in "addWorkerReport()".
- Improve some log-messages.
- Change the log-level for the "S3-bucket already exists" message.
- Update Gradle.
- Optimize imports.
- Code cleanup.
2021-12-21 15:55:27 +02:00
Lampros Smyrnaios 0178e44574 - Increase security by sanitizing the value of the "workerId" before use it in sql-statements. Impala has bugs with some types of PreparedStatements.
- Improve reliability, by dropping the "current_assignment" table in case of an error, thus the next "getUrls"-request will not fail.
- Fix the "databaseLock" not being unlocked when the "addWorkerReport()" method returned early on some error-cases.
- Delete the "assignment"-data after inserting the related payloads and attempts in the database.
2021-12-10 21:47:58 +02:00
Lampros Smyrnaios dea257b87f - Fix a bug, which caused the get-full-texts request to fail, because of the wrong "requestAssignmentsCounter".
- Fix a bug, which caused multiple workers to get assigned the same batch-counter, while the assignment-tasks where different.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the error-message returned from the Worker, when a getFullTexts-request fails.
- Improve some log-messages.
- Update dependencies.
- Code cleanup.
2021-12-06 20:18:30 +02:00
Lampros Smyrnaios 48eed20dd8 - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files.
- Move the "mergeParquetFiles" and "getCutBatchExceptionMessage" methods inside the "FileUtils" class.
- Code cleanup.
2021-11-30 18:23:27 +02:00
Lampros Smyrnaios 0d47c33a08 - Improve logging configurations.
- Fix the "AssignmentResponse.toString()" method.
- Update SpringBoot to v.2.5.6
- Code cleanup.
2021-11-04 11:57:19 +02:00
Lampros Smyrnaios c194af167f Allow handling of concurrent requests to the "getTestUrls"-endpoint. 2021-06-10 20:24:51 +03:00
Lampros Smyrnaios 308cab5ecd - Return an HTTP-500-error when the server cannot find the resourceFile requested by the "getTestUrls"-endpoint.
- Close the "inputScanner" after each use when retrieving the test-tasks.
- Show info-logs when sending an assignment to a worker.
- Code cleanup.
2021-06-10 14:21:39 +03:00
Lampros Smyrnaios 787299b5b7 Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
Lampros Smyrnaios e2cc320baf - Add the "getTestUrls"-endpoint which returns an "Assignment" with data retrieved from the added resource-file.
- Update the "getUrls"-endpoint to be ready to retrieve data from the database, once it's added.
- Update the dependencies.
- Code cleanup.
2021-05-18 17:23:20 +03:00