You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
UrlsController/src/main/java/eu/openaire/urls_controller/models
Lampros Smyrnaios 08de530f03 Various improvements:
- Handle the case when "fileUtils.constructS3FilenameAndUploadToS3()" returns "null", in "processBulkImportedFile()".
- Avoid an "IllegalArgumentException" in "Lists.partition()" when the number of files to bulkImport are fewer than the number of threads available to handle them.
- Include the last directory's "/" divider in the fileDIR group of "FILEPATH_ID_EXTENSION" regex (renamed from "FILENAME_ID_EXTENSION").
- Fix an incomplete log-message.
- Provide the "fileLocation" argument in the "DocFileData" constructor, in "processBulkImportedFile()", even though it's not used after.
4 weeks ago
..
Assignment.java Avoid assigning the same publications multiple times to the Workers, after the recent "parallelization enchantment". 10 months ago
Attempt.java - Upgrade the results-loading process: Instead of making thousands of sql-insert requests to Impala now we write the results to parquet files, upload them to HDFS and then import the data into the Impala tables with just 2 requests. This results in a huge performance improvement. 1 year ago
BulkImportReport.java - Optimize writing to the Bulk-import-report file. 1 month ago
BulkImportResponse.java Update Bulk-Import API: 9 months ago
Datasource.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 3 years ago
DocFileData.java New feature: BulkImport full-text files from compatible datasources. 12 months ago
Error.java - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases. 3 years ago
FileLocationData.java Various improvements: 4 weeks ago
ParquetReport.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 1 year ago
Payload.java - Add error-handling for the case when no payloads could be associated with a specific url which should have been in the hashMultiMap in "addUrlReportsByMatchingRecordsFromBacklog". 2 months ago
SumParquetSuccess.java - Apply error-checking on individual CallableTasks and in tasks-batches related to the creation and upload of all the data related to the "attempt" and "payload" table. So, if no data could be uploaded for one or both tables, no "load"-queries will be executed for that/those tables. 1 year ago
Task.java Add the "Datasource" inside the "Task" class and include it in the Assignment. 3 years ago
UrlReport.java - Add the "isControllerAlive"-endpoint. 3 years ago
WorkerInfo.java - Use separate HDFS subdirectories for each worker in order to avoid seeing exceptions about "empty hdfs directory" when "loading" data to the database, because one worker has loaded data generated by multiple workers (since we use only 1 load operation for multiple parquet files). 12 months ago