UrlsController/src/main/java/eu/openaire/urls_controller/models
Lampros Smyrnaios 8bc5cc35e2 - Optimize writing to the Bulk-import-report file.
- Show the IP of the worker which posts a "workerShutdownReport".
- Code polishing.
2024-03-22 17:50:55 +02:00
..
Assignment.java Avoid assigning the same publications multiple times to the Workers, after the recent "parallelization enchantment". 2023-07-11 17:27:23 +03:00
Attempt.java - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. 2022-11-10 17:18:21 +02:00
BulkImportReport.java - Optimize writing to the Bulk-import-report file. 2024-03-22 17:50:55 +02:00
BulkImportResponse.java Update Bulk-Import API: 2023-07-25 11:59:47 +03:00
Datasource.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
DocFileData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Error.java - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. 2021-08-05 15:43:37 +03:00
FileLocationData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
ParquetReport.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Payload.java - Add error-handling for the case when no payloads could be associated with a specific url which should have been in the hashMultiMap in "addUrlReportsByMatchingRecordsFromBacklog". 2024-03-11 19:48:04 +02:00
SumParquetSuccess.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Task.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
UrlReport.java - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00
WorkerInfo.java - Use separate HDFS subdirectories for each worker in order to avoid seeing exceptions about "empty hdfs directory" when "loading" data to the database, because one worker has loaded data generated by multiple workers (since we use only 1 load operation for multiple parquet files). 2023-05-15 13:12:20 +03:00