Commit Graph

  • aad37cd81e Add the "StatsController", which brings the "getNumberOfPayloads" and "getNumberOfRecordsInspected" endpoints. Lampros Smyrnaios 2022-10-18 15:00:26 +0300
  • e2d53105d1 Fix not creating the "assignment" table in a new production database, which contains only the "publication" and "datasource" data. Lampros Smyrnaios 2022-10-07 15:51:31 +0300
  • b6340066a7 - Improve handling of the case, where the full-texts were found, but the Controller could not acquire them from the Worker. - Add/improve logs and comments. - Code cleanup. Lampros Smyrnaios 2022-09-28 22:34:33 +0300
  • a22144bd51 - Refactor "FileUtils.getErrorMessageFromResponseBody(conn)" into "FileUtils.getMessageFromResponseBody(conn, isError)", in order to be able to either retrieve the "normal" or the "error" response. - Add comments. Lampros Smyrnaios 2022-09-15 23:12:05 +0300
  • 3e8f9c6074 Update the "UriBuilder.java" to be able to acquire the running port of the server, in case the port-number was initially set to "random" (0). Also make sure we get the "localHostAddress" and not the "localHostName", in case the public IP is not retrievable. Lampros Smyrnaios 2022-09-12 17:04:05 +0300
  • a2cd02115f - Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated. - Update dependencies. Lampros Smyrnaios 2022-06-27 21:41:32 +0300
  • e3b374a32f - Optimize file-related tasks. - Update dependencies. - Code cleanup. Lampros Smyrnaios 2022-05-26 15:43:59 +0300
  • 9096137008 Update documentation. Lampros Smyrnaios 2022-04-14 14:42:36 +0300
  • 9b95eebb6c - Remove the obsolete "parenthesis" and "increasing duplicate-num" from the full-texts' names, before sending them to the S3-Object-Store. They now end with the "file-hash", so it is guaranteed that they will be unique. The Worker continues to produce the previous kind of names, without any disturbance. - Improve logging. - Update MinIO dependency. Lampros Smyrnaios 2022-04-11 21:15:22 +0300
  • a81ed3c60f - Add an "isTestEnvironment"-switch, which makes it easier to work with production and test databases. - In case the Worker cannot be reached during a full-texts' batch request, abort the rest of the batches. - Fix memory leaks when unzipping the batch-zip-file. - Add explanatory comments for picking the database related to a full-text file. Lampros Smyrnaios 2022-04-08 17:39:45 +0300
  • 33fc61a8d9 - Fix the fileName-ID not being directly related with the datasourceID, in the S3-ObjectStore name. Add explanatory comments. - Add missing error-logs. Lampros Smyrnaios 2022-04-05 16:22:02 +0300
  • a23c918a42 - Fix a "@JsonProperty" annotation inside "Payload.java". - Fix a "@Value" annotation inside "FileUtils.java". - Add a new database and show its name along with the initial's name in the logs. - Code cleanup and improvement. Lampros Smyrnaios 2022-04-05 00:01:44 +0300
  • 5e4fad2479 - Change the fileNames' structure in the S3-ObjectStore. - Update dependencies. Lampros Smyrnaios 2022-04-01 19:24:04 +0300
  • 48670f3399 - Show the percentage of the "NumFullTextsFound", in the logs. - Update dependencies. Lampros Smyrnaios 2022-03-28 14:29:31 +0300
  • e587b2ca6c Update Spring dependencies. Lampros Smyrnaios 2022-02-25 17:41:24 +0200
  • 88acaae20f - Replace the "numFullTextUrlsFound"-counter with "numFullTextsFound"-counter to reflect the end result of the actually available full-texts (which were downloaded by the Worker). - Optimize the gather-fileNames loop. - Improve a message in "installAndRun.sh" Lampros Smyrnaios 2022-02-23 17:40:06 +0200
  • ad5dbdde9b - Improve performance when inserting records into the "attempt" table, by splitting the records equally, across more threads. - Bring back the "UriBuilder", which informs us in the logs, about the Controller's url (IP, PORT, API). - Code cleanup. Lampros Smyrnaios 2022-02-22 13:54:16 +0200
  • dfd40cb105 Insert only the records with uploaded-to-S3 full-texts, in the "payload" table. Lampros Smyrnaios 2022-02-17 16:27:40 +0200
  • 71f6b46130 - In case of an error when creating the "current_assignment" table (e.g out of memory in the backend database server), check for partial-creation and drop it. Also, in any case, before we drop this table, now check if it exists firsts (in general it should always exist, unless the creation results in an error and the table was not created at all). - Fix an error-message. - Update dependencies. - Code cleanup. Lampros Smyrnaios 2022-02-14 12:36:00 +0200
  • d2ed9cd9ed Improve efficiency and performance when processing the full-texts. Lampros Smyrnaios 2022-02-08 15:02:13 +0200
  • 5819bf584b Update the README.md Lampros Smyrnaios 2022-02-07 21:11:03 +0200
  • 1111c850b9 - Add support for more than one full-text per id. Allow recognizing fileName additions: "id(1).pdf", "id(2).pdf", etc. - Fix not giving the databaseName in the "ImpalaController.get10PublicationIdsTest()". - Improve consistency in the "maxAttemptsPerRecord" value, among different threads. Also, reduce the value-increase by one. - Check if the tableName string is empty, in the "mergeParquetFiles". - Improve error-logging. - Set some local variables to "final", optimizing code-execution by the JVM. Lampros Smyrnaios 2022-02-07 13:57:09 +0200
  • 5d70e82504 Merge pull request 'Springify and dockerize project (fixed and improved)' (#2) from springify_project into master Lampros Smyrnaios 2022-02-04 14:56:16 +0100
  • b206114144 - Allow the user to build, push and run the App in Docker, straight though the "installAndRun.sh" script. - Re-add the logback-spring configuration. - Change the docker-app name. #2 springify_project Lampros Smyrnaios 2022-02-04 15:49:56 +0200
  • 6aab1d242b - Improve performance when handling WorkerReports' database insertions, by using parallelism to insert to two different tables in the same time. Also, pre-cache the query-argument-types. - Update the error-message and counting system, on partial insertion event. Lampros Smyrnaios 2022-02-04 14:48:22 +0200
  • be4898e43e Bug fixes and improvements: - Fix an NPE, when the "getTestUrls"-endpoint is called. It was thrown because of an absent per-thread initialization of some thread-local variables. - Fix JdbcTemplate error when querying the "getFileLocationForHashQuery". - Fix the "S3ObjectStore.isLocationInStore" check. - Fix not catching/handling some exceptions. - Fix/improve log-messages. - Optimize the "getFileLocationForHashQuery" to return only the first row. In the latest change, without this optimization, the query-result would cause non-handling the same-hash cases, because of an exception. - Optimize the "ImpalaConnector.databaseLock.lock()" positioning. - Update the "getTestUrls" api-path. - Optimize list-allocation. - Re-add the info-message about the successful emptying of the S3-bucket. - Code cleanup. Lampros Smyrnaios 2022-02-02 20:19:46 +0200
  • d1c86ff273 Merge pull request 'Springify project' (#1) from antonis.lempesis/UrlsController:master into springify_project Lampros Smyrnaios 2022-02-01 19:51:50 +0100
  • 35966b6f6e finishing toucehs #1 Antonis Lempesis 2022-02-01 16:57:28 +0200
  • c093e52d15 Merge branch 'master' of https://code-repo.d4science.org/antonis.lempesis/UrlsController Antonis Lempesis 2022-02-01 02:08:21 +0200
  • e9bede5c45 more fixes Antonis Lempesis 2022-02-01 02:08:02 +0200
  • f5748434c7 Merge branch 'master' of https://code-repo.d4science.org/antonis.lempesis/UrlsController Antonis Lempesis 2022-01-31 14:01:39 +0200
  • 9ac10fc4b3 fixed Value annotations Antonis Lempesis 2022-01-31 14:01:26 +0200
  • 0772e9cdfb Merge branch 'master' of https://code-repo.d4science.org/antonis.lempesis/UrlsController Antonis Lempesis 2022-01-31 13:49:34 +0200
  • 1c82088a7c fixed Value annotations Antonis Lempesis 2022-01-31 13:49:14 +0200
  • 3da6fd98e9 added Dockerfile Antonis Lempesis 2022-01-31 04:21:31 +0200
  • 6dde8c0faa finished merge Antonis Lempesis 2022-01-31 04:17:16 +0200
  • e47fd8d97b merged refactor branch Antonis Lempesis 2022-01-30 23:10:06 +0200
  • 3741cce886 springified project Antonis Lempesis 2022-01-30 22:15:13 +0200
  • bf26bf955f springified project Antonis Lempesis 2022-01-30 22:14:52 +0200
  • d0ab42e4fa - Change the scheme of the file-location URI. - Move the old and the current database names in the "application.properties" file. - Improve logging. Lampros Smyrnaios 2022-01-28 07:24:42 +0200
  • 92b11baf93 - Update the repository for the Impala JDBC Driver. - Code cleanup. Lampros Smyrnaios 2022-01-28 00:59:19 +0200
  • 91f460ce51 moved impala jar to omtd repository and updated build file Antonis Lempesis 2022-01-28 00:41:29 +0200
  • a01e11eef0 When all the data is processed, increase the number of "max-attempts" to retry some very old records, in the next requests. Lampros Smyrnaios 2022-01-27 01:18:26 +0200
  • 3c9f8870d1 - Change the repository for the Impala JDBC Driver, as the previous one had networking issues. - Optimize the "findAssignmentsQuery". Lampros Smyrnaios 2022-01-26 19:52:46 +0200
  • ff46839158 Fix not prioritizing the gradle version defined inside the "installAndRun.sh" script. Lampros Smyrnaios 2022-01-21 15:45:12 +0200
  • 8d9336fa52 Update dependencies. Lampros Smyrnaios 2022-01-21 15:04:29 +0200
  • ab99bc6168 - Make sure the temp table "current_assignment" from a cancelled previous execution, is dropped and purged on startup. - Improve logging. - Code cleanup. Lampros Smyrnaios 2022-01-19 01:37:47 +0200
  • 83f40a23d9 Bring back the prepared-statements for the insert-queries. After the fix of the "broken pipe"-error, they now work. Bringing them back, increases security and solves the "SQL syntax errors" caused by the values of some URLs. Lampros Smyrnaios 2022-01-13 00:54:21 +0200
  • 2cf25b0d26 - Fix Impala "broken pipe" error, by closing the connection when not in need. The connection is reopened later with minimal overhead, as a connection pool is used. - Fix not closing the database-connection in case of a specific error (also in a commented error-case). Lampros Smyrnaios 2022-01-13 00:47:15 +0200
  • 82bf11b9b3 - Workaround a bug of Impala-JDBC-Driver, when creating insert-prepared-statements. - Update dependencies. Lampros Smyrnaios 2021-12-24 00:25:50 +0200
  • 33ba3e8d91 - Avoid getting and uploading (to S3), full-texts which are already uploaded by previous assignments-batches. - Fix not updating the fileLocation with the s3Url for records which share the same full-text. - Set only one delete-order for each assignments-batch-files, not one (or more, by mistake) per zip-batch. - Set the HttpStatus to "204 - NO_CONTENT", when no assignments are available to be returned to the Worker. - Fix not unlocking the "dataBaseLock" in case of a "dataBase-connection"-error, in "addWorkerReport()". - Improve some log-messages. - Change the log-level for the "S3-bucket already exists" message. - Update Gradle. - Optimize imports. - Code cleanup. Lampros Smyrnaios 2021-12-21 15:55:27 +0200
  • 0178e44574 - Increase security by sanitizing the value of the "workerId" before use it in sql-statements. Impala has bugs with some types of PreparedStatements. - Improve reliability, by dropping the "current_assignment" table in case of an error, thus the next "getUrls"-request will not fail. - Fix the "databaseLock" not being unlocked when the "addWorkerReport()" method returned early on some error-cases. - Delete the "assignment"-data after inserting the related payloads and attempts in the database. Lampros Smyrnaios 2021-12-10 21:47:58 +0200
  • a46ab84f10 - Increase the lower and upper limits for the Java Heap Size. - Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app. Lampros Smyrnaios 2021-12-06 20:27:39 +0200
  • dea257b87f - Fix a bug, which caused the get-full-texts request to fail, because of the wrong "requestAssignmentsCounter". - Fix a bug, which caused multiple workers to get assigned the same batch-counter, while the assignment-tasks where different. - Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted. - Show the error-message returned from the Worker, when a getFullTexts-request fails. - Improve some log-messages. - Update dependencies. - Code cleanup. Lampros Smyrnaios 2021-12-06 20:18:30 +0200
  • 15224c6468 Improve performance in the "getUrls"-endpoint, and more: - Optimize the "findAssignmentsQuery" by using an inner limit (larger than the outer). - Save a ton of time from inserting the assignments into the database, by using a temporal table to hold the new assignments, in order for them to be easily accessible both from the Controller (which processes them and send them to the Worker) and the database itself, in order to "import" them into the "assignment"-table. - Replace the "Date" with "Timestamp", in order to hold more detailed information. - Code cleanup. Lampros Smyrnaios 2021-11-30 19:59:46 +0200
  • 48eed20dd8 - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files. - Move the "mergeParquetFiles" and "getCutBatchExceptionMessage" methods inside the "FileUtils" class. - Code cleanup. Lampros Smyrnaios 2021-11-30 18:23:27 +0200
  • 780ed15ce2 - Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction. - Fix the "null" representation of an "unknown" payload-size in the database. - Remove the obsolete thread-locking for the "CreateDatabase" operation. This code is guaranteed to run BEFORE any other operation in the database. - Implement the "handlePreparedStatementException" and "closeConnection" methods. - Improve error-logs. - Update dependencies. - Code cleanup. Lampros Smyrnaios 2021-11-30 13:26:19 +0200
  • d100af35d0 - Implement the "getUrls" and "addWorkerReport" endpoints with full database-handling. - Add connectivity with an Impala-database and create a dedicated Controller for future statistics-requests. - Optimize the "getTestUrls"-endpoint. - Disable the "reportCurrentTime()" scheduled-task. - Update dependencies and bump project's version to '1.0.0-SNAPSHOT'. - Set the logging-appender to "File". - Code cleanup. Lampros Smyrnaios 2021-11-09 23:59:27 +0200
  • 0d47c33a08 - Improve logging configurations. - Fix the "AssignmentResponse.toString()" method. - Update SpringBoot to v.2.5.6 - Code cleanup. Lampros Smyrnaios 2021-11-04 11:57:19 +0200
  • 0540820817 -Update "installAndRun.sh": -- Check if a gradle installation with the given version already exists, before downloading and installing gradle. -- Make sure the "unzip" package is installed, before trying unzipping the gradle package. - Update the logs' fileName. Lampros Smyrnaios 2021-10-14 02:46:33 +0300
  • d931315ced - Add the "isControllerAlive"-endpoint. - Change the data-type of the "UrlReport.status" to be "enum StatusType", in order to increase consistency and comparability. - Change the "Date" datatype in "Payload" to have the SQL's version. - Fix the project's name inside "settings.gradle". - Code cleanup. Lampros Smyrnaios 2021-09-23 15:08:52 +0300
  • 983b900da7 - Add the "installAndRun.sh" script. - Update the README. - Update the dependencies. Lampros Smyrnaios 2021-09-09 15:56:37 +0300
  • d56e988518 - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. - Update the "RequestParam" for the getUrls endpoints. - Fix the "assignmentCounter". - Code cleanup. Lampros Smyrnaios 2021-08-05 15:43:37 +0300
  • 25c566bf68 - Change server's port. - Update dependencies. Lampros Smyrnaios 2021-07-29 08:44:36 +0300
  • 27375b9396 - Refactor the Assignment-creation. In order to match the database, now we have a list of Assignments sent through the AssignmentResponse, instead of a single Assignment having a list of tasks. - Cleanup the members of the "Payload" model (also prepare for database integration). Lampros Smyrnaios 2021-07-05 14:04:39 +0300
  • 5e7ccbd8c6 Add the "addWorkerReport" endpoint. Lampros Smyrnaios 2021-06-22 05:38:48 +0300
  • 40763ec146 Update the "WorkerReport" request and the "UrlReport" and "Payload" models. Lampros Smyrnaios 2021-06-19 07:07:36 +0300
  • c194af167f Allow handling of concurrent requests to the "getTestUrls"-endpoint. Lampros Smyrnaios 2021-06-10 20:24:51 +0300
  • 308cab5ecd - Return an HTTP-500-error when the server cannot find the resourceFile requested by the "getTestUrls"-endpoint. - Close the "inputScanner" after each use when retrieving the test-tasks. - Show info-logs when sending an assignment to a worker. - Code cleanup. Lampros Smyrnaios 2021-06-10 14:21:39 +0300
  • 6729f51b03 Add an "assignmentId" field in the "Assignment"-class. Lampros Smyrnaios 2021-06-09 05:48:54 +0300
  • 87044574b5 Update dependencies and add the "gradle-wrapper.properties" file which defines the gradle version. Lampros Smyrnaios 2021-06-08 19:12:40 +0300
  • 787299b5b7 Add the "Datasource" inside the "Task" class and include it in the Assignment. Lampros Smyrnaios 2021-05-20 02:50:50 +0300
  • d20fcf9cce - Update the "getUrls" and "getTestUrls" endpoints to take the data from the workers, in http-request-parameters instead of the http-body. - Fix tasksLimit-check. - Code cleanup. Lampros Smyrnaios 2021-05-19 02:32:46 +0300
  • e2cc320baf - Add the "getTestUrls"-endpoint which returns an "Assignment" with data retrieved from the added resource-file. - Update the "getUrls"-endpoint to be ready to retrieve data from the database, once it's added. - Update the dependencies. - Code cleanup. Lampros Smyrnaios 2021-05-18 17:23:20 +0300
  • d3588ea36b Add the "DownloadAttempt" class. Lampros Smyrnaios 2021-04-24 21:44:51 +0300
  • a6ab810ad3 Update classes: "Publication" and "Payload". Lampros Smyrnaios 2021-04-24 21:40:10 +0300
  • 85ecc4a36b Add classes: "AssignmentResponse", "WorkerReport", "WorkerRequest", "UrlReport". Lampros Smyrnaios 2021-04-24 21:06:52 +0300
  • c2ea8a69de Update classes: "Assignment", "Task", "Error", "Payload", "UrlsRequest". Lampros Smyrnaios 2021-04-24 21:05:21 +0300
  • 89c6a73a30 Add "Assignment", "Task" and "Error" classes. Lampros Smyrnaios 2021-04-15 03:36:08 +0300
  • c6e12d3e95 - Update "addResults"-endpoint. - Add "UrlsRequest.java". - Some minor updates in "build.gradle" and "application.properties". Lampros Smyrnaios 2021-03-16 18:07:30 +0200
  • 8a4376da9c Initial commit of UrlsController. Lampros Smyrnaios 2021-03-16 15:25:15 +0200