UrlsController/src/main/java/eu/openaire/urls_controller/models
Lampros Smyrnaios 0ab6bae93a - Optimize the json-conversion of the "BulkImportReport".
- Code polishing.
2023-05-18 17:30:40 +03:00
..
Assignment.java Improve performance in the "getUrls"-endpoint, and more: 2021-11-30 19:59:46 +02:00
Attempt.java - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. 2022-11-10 17:18:21 +02:00
BulkImportReport.java - Optimize the json-conversion of the "BulkImportReport". 2023-05-18 17:30:40 +03:00
Datasource.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
DocFileData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Error.java - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. 2021-08-05 15:43:37 +03:00
FileLocationData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
ParquetReport.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Payload.java - Fix a "@JsonProperty" annotation inside "Payload.java". 2022-04-05 00:01:44 +03:00
SumParquetSuccess.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Task.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
UrlReport.java - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00
WorkerInfo.java - Use separate HDFS subdirectories for each worker in order to avoid seeing exceptions about "empty hdfs directory" when "loading" data to the database, because one worker has loaded data generated by multiple workers (since we use only 1 load operation for multiple parquet files). 2023-05-15 13:12:20 +03:00