UrlsController/src/main/java/eu/openaire/urls_controller/models
Lampros Smyrnaios 66a5b3c7da Update Bulk-Import API:
- Increase the "numOfThreadsPerBulkImportProcedure" to 6.
- Fix Bulk import not working from a second-level subdirectory; the report-subDirectory was not created.
- Fix not returning the bulk-import-report as "application/json".
- Add useful messages for missing parameters.
- Change the HTTP-method for the "bulkImportFullTexts" endpoint to "POST".
- Show a structured json-response for the "bulkImportFullTexts" endpoint.
- Fix uncommon date-format.
- Remove single quotes from json-report, since they are returned as bytes, not characters.
- Optimize the generation of the json-bulkImport-report.
2023-07-25 11:59:47 +03:00
..
Assignment.java Avoid assigning the same publications multiple times to the Workers, after the recent "parallelization enchantment". 2023-07-11 17:27:23 +03:00
Attempt.java - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. 2022-11-10 17:18:21 +02:00
BulkImportReport.java Update Bulk-Import API: 2023-07-25 11:59:47 +03:00
BulkImportResponse.java Update Bulk-Import API: 2023-07-25 11:59:47 +03:00
Datasource.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
DocFileData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
Error.java - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. 2021-08-05 15:43:37 +03:00
FileLocationData.java New feature: BulkImport full-text files from compatible datasources. 2023-05-11 03:07:55 +03:00
ParquetReport.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Payload.java - Fix a "@JsonProperty" annotation inside "Payload.java". 2022-04-05 00:01:44 +03:00
SumParquetSuccess.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 2022-12-09 12:46:06 +02:00
Task.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 2021-05-20 02:50:50 +03:00
UrlReport.java - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00
WorkerInfo.java - Use separate HDFS subdirectories for each worker in order to avoid seeing exceptions about "empty hdfs directory" when "loading" data to the database, because one worker has loaded data generated by multiple workers (since we use only 1 load operation for multiple parquet files). 2023-05-15 13:12:20 +03:00