UrlsController/src/main/java/eu/openaire/urls_controller/models
Lampros Smyrnaios 39c36f9e66 - Resolve a concurrency issue, by enforcing synchronization on the "BulkImportReport.getJsonReport()" method.
- Increase the number of stacktrace-lines to 20, for bulkImport-segment-failures.
- Improve "GenericUtils.getSelectiveStackTrace()".
2024-05-01 01:29:25 +03:00
..
Assignment.java Avoid assigning the same publications multiple times to the Workers, after the recent "parallelization enchantment". 2023-07-11 17:27:23 +03:00
Attempt.java - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. 2022-11-10 17:18:21 +02:00
BulkImportReport.java - Resolve a concurrency issue, by enforcing synchronization on the "BulkImportReport.getJsonReport()" method. 2024-05-01 01:29:25 +03:00
BulkImportResponse.java Update Bulk-Import API: 2023-07-25 11:59:47 +03:00
Datasource.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
DocFileData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Error.java - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. 2021-08-05 15:43:37 +03:00
FileLocationData.java Various improvements: 2024-03-29 17:23:01 +02:00
ParquetReport.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Payload.java - Add error-handling for the case when no payloads could be associated with a specific url which should have been in the hashMultiMap in "addUrlReportsByMatchingRecordsFromBacklog". 2024-03-11 19:48:04 +02:00
SumParquetSuccess.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Task.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
UrlReport.java - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00
WorkerInfo.java - Use separate HDFS subdirectories for each worker in order to avoid seeing exceptions about "empty hdfs directory" when "loading" data to the database, because one worker has loaded data generated by multiple workers (since we use only 1 load operation for multiple parquet files). 2023-05-15 13:12:20 +03:00