UrlsController

Commit Graph

Author	SHA1	Message	Date
Lampros Smyrnaios	44459c8681	- Rename "ImpalaConnector.java" to "DatabaseConnector.java". - Update dependencies. - Code polishing.	2023-08-23 16:55:23 +03:00
Lampros Smyrnaios	8dfb58ee63	Avoid assigning the same publications multiple times to the Workers, after the recent "parallelization enchantment". After that enchantment, each worker could request multiple assignment-batches, before its previous batches were processed by the Controller. This means that for each batch that was processed, the Controller was deleting from the "assignment" table, all the assignments (-batches) delivered to the Worker that brought that batch, even though the "attempt" and "payload" records for the rest of the batches were not inserted in the DB yet. So in a new assignments-batch request, the same publications that were already under processing, were delivered to the same or other Workers. Now, for each finished batch, only the assignments of that batch are deleted from the "assignment" table.	2023-07-11 17:27:23 +03:00
Lampros Smyrnaios	a89abe3f2f	Prioritize the publications, which are specified inside the "publication_boost" table, according to their "boost-level".	2023-06-29 12:32:06 +03:00
Lampros Smyrnaios	55ea5118ac	- Update the "testDatabaseName" property. - Code polishing.	2023-04-26 19:33:28 +03:00
Lampros Smyrnaios	4dc34429f8	- Increase the waiting-time before checking the docker containers' status, in order to catch configuration-crashes. - Code polishing.	2023-04-10 22:28:53 +03:00
Lampros Smyrnaios	c39fef2654	Upgrade payload-table to payload-view which consists of three separate payload tables: "payload_legacy", "payload_aggregated" and "payload_bulk_import".	2023-04-10 15:55:50 +03:00
Lampros Smyrnaios	4280f89296	- Set the default value of the "isTestEnvironment" property to "true", in order to avoid undesired outcomes in the production db. - Code polishing.	2023-03-21 17:04:28 +02:00
Lampros Smyrnaios	c4670073ae	- Add missing refactoring-change. - Code polishing. - Update Spring.	2023-02-24 23:49:04 +02:00
Lampros Smyrnaios	c8baf5a5fc	- Fix not finding the parquet-schema files when the app was run inside a Docker Container. - Update the "namespaces" and the "names" inside the parquet schemas. - Code polishing.	2022-12-08 12:16:05 +02:00
Lampros Smyrnaios	95c38c4a24	- Fix creating the "assignment" table, always in the testDatabase. - Code polishing.	2022-12-07 14:58:38 +02:00
Lampros Smyrnaios	3c5f4c6464	Fix bytes to MB conversion.	2022-12-07 14:32:18 +02:00
Lampros Smyrnaios	6226e2298d	- Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. One side effect of using the parquet-files, is that the timestamps are now BIGDECIMAL numbers, instead of "Timestamp" objects, but, converting them to such objects is pretty easy, if we ever need to do it. - Code polishing.	2022-11-10 17:18:21 +02:00
Lampros Smyrnaios	e2d53105d1	Fix not creating the "assignment" table in a new production database, which contains only the "publication" and "datasource" data.	2022-10-07 15:51:31 +03:00
Lampros Smyrnaios	e3b374a32f	- Optimize file-related tasks. - Update dependencies. - Code cleanup.	2022-05-26 15:43:59 +03:00
Lampros Smyrnaios	9b95eebb6c	- Remove the obsolete "parenthesis" and "increasing duplicate-num" from the full-texts' names, before sending them to the S3-Object-Store. They now end with the "file-hash", so it is guaranteed that they will be unique. The Worker continues to produce the previous kind of names, without any disturbance. - Improve logging. - Update MinIO dependency.	2022-04-11 21:15:22 +03:00
Lampros Smyrnaios	a81ed3c60f	- Add an "isTestEnvironment"-switch, which makes it easier to work with production and test databases. - In case the Worker cannot be reached during a full-texts' batch request, abort the rest of the batches. - Fix memory leaks when unzipping the batch-zip-file. - Add explanatory comments for picking the database related to a full-text file.	2022-04-08 17:39:45 +03:00
Lampros Smyrnaios	a23c918a42	- Fix a "@JsonProperty" annotation inside "Payload.java". - Fix a "@Value" annotation inside "FileUtils.java". - Add a new database and show its name along with the initial's name in the logs. - Code cleanup and improvement.	2022-04-05 00:01:44 +03:00
Lampros Smyrnaios	1111c850b9	- Add support for more than one full-text per id. Allow recognizing fileName additions: "id(1).pdf", "id(2).pdf", etc. - Fix not giving the databaseName in the "ImpalaController.get10PublicationIdsTest()". - Improve consistency in the "maxAttemptsPerRecord" value, among different threads. Also, reduce the value-increase by one. - Check if the tableName string is empty, in the "mergeParquetFiles". - Improve error-logging. - Set some local variables to "final", optimizing code-execution by the JVM.	2022-02-07 13:57:09 +02:00
Lampros Smyrnaios	be4898e43e	Bug fixes and improvements: - Fix an NPE, when the "getTestUrls"-endpoint is called. It was thrown because of an absent per-thread initialization of some thread-local variables. - Fix JdbcTemplate error when querying the "getFileLocationForHashQuery". - Fix the "S3ObjectStore.isLocationInStore" check. - Fix not catching/handling some exceptions. - Fix/improve log-messages. - Optimize the "getFileLocationForHashQuery" to return only the first row. In the latest change, without this optimization, the query-result would cause non-handling the same-hash cases, because of an exception. - Optimize the "ImpalaConnector.databaseLock.lock()" positioning. - Update the "getTestUrls" api-path. - Optimize list-allocation. - Re-add the info-message about the successful emptying of the S3-bucket. - Code cleanup.	2022-02-02 20:19:46 +02:00
Antonis Lempesis	35966b6f6e	finishing toucehs	2022-02-01 16:57:28 +02:00
Antonis Lempesis	e9bede5c45	more fixes	2022-02-01 02:08:02 +02:00
Antonis Lempesis	9ac10fc4b3	fixed Value annotations	2022-01-31 14:01:26 +02:00
Antonis Lempesis	1c82088a7c	fixed Value annotations	2022-01-31 13:49:14 +02:00
Antonis Lempesis	6dde8c0faa	finished merge	2022-01-31 04:17:16 +02:00
Antonis Lempesis	bf26bf955f	springified project	2022-01-30 22:14:52 +02:00
Lampros Smyrnaios	d0ab42e4fa	- Change the scheme of the file-location URI. - Move the old and the current database names in the "application.properties" file. - Improve logging.	2022-01-28 07:24:42 +02:00
Lampros Smyrnaios	ab99bc6168	- Make sure the temp table "current_assignment" from a cancelled previous execution, is dropped and purged on startup. - Improve logging. - Code cleanup.	2022-01-19 01:37:47 +02:00
Lampros Smyrnaios	33ba3e8d91	- Avoid getting and uploading (to S3), full-texts which are already uploaded by previous assignments-batches. - Fix not updating the fileLocation with the s3Url for records which share the same full-text. - Set only one delete-order for each assignments-batch-files, not one (or more, by mistake) per zip-batch. - Set the HttpStatus to "204 - NO_CONTENT", when no assignments are available to be returned to the Worker. - Fix not unlocking the "dataBaseLock" in case of a "dataBase-connection"-error, in "addWorkerReport()". - Improve some log-messages. - Change the log-level for the "S3-bucket already exists" message. - Update Gradle. - Optimize imports. - Code cleanup.	2021-12-21 15:55:27 +02:00
Lampros Smyrnaios	780ed15ce2	- Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction. - Fix the "null" representation of an "unknown" payload-size in the database. - Remove the obsolete thread-locking for the "CreateDatabase" operation. This code is guaranteed to run BEFORE any other operation in the database. - Implement the "handlePreparedStatementException" and "closeConnection" methods. - Improve error-logs. - Update dependencies. - Code cleanup.	2021-11-30 13:26:19 +02:00
Lampros Smyrnaios	d100af35d0	- Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling. - Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests. - Optimize the "getTestUrls"-endpoint. - Disable the "reportCurrentTime()" scheduled-task. - Update dependencies and bump project's version to '1.0.0-SNAPSHOT'. - Set the logging-appender to "File". - Code cleanup.	2021-11-09 23:59:27 +02:00

30 Commits