Add Collector Plugin for Zenodo Dumps #516

Merged
claudio.atzori merged 3 commits from zenodo_dump_collection into beta 2024-12-06 13:51:14 +01:00

This pull request introduces a new collector plugin designed to handle Zenodo dumps. The plugin expects the following configuration in the APIDescriptor:

The plugin performs the following tasks:

  • Downloads the tar.gz file from the specified baseURL.
  • Extracts the contents of the tar.gz file.
  • Iterates over all files inside the archive and processes them as needed.

DNET Integration
I am unsure of the steps required to integrate this new plugin into the DNET infrastructure. Unfortunately, I don’t recall the legacy code details or the integration process.

This pull request introduces a new collector plugin designed to handle Zenodo dumps. The plugin expects the following configuration in the APIDescriptor: - baseURL: The URL pointing to the latest dump tarball (e.g., https://cernbox.cern.ch/remote.php/dav/public-files/hoJyvfqO6hDjDeh/zenodo-2024-09-18_155657.tar.gz). - hdfsURI (inside params): The main HDFS URI. The plugin performs the following tasks: - Downloads the tar.gz file from the specified baseURL. - Extracts the contents of the tar.gz file. - Iterates over all files inside the archive and processes them as needed. **DNET Integration** I am unsure of the steps required to integrate this new plugin into the DNET infrastructure. Unfortunately, I don’t recall the legacy code details or the integration process.
sandro.labruzzo added 2 commits 2024-12-04 14:02:03 +01:00
claudio.atzori was assigned by sandro.labruzzo 2024-12-04 14:04:20 +01:00
miriam.baglioni was assigned by sandro.labruzzo 2024-12-04 14:04:26 +01:00
giambattista.bloisi was assigned by sandro.labruzzo 2024-12-04 14:04:32 +01:00
sandro.labruzzo added 1 commit 2024-12-04 15:04:08 +01:00
miriam.baglioni reviewed 2024-12-06 11:20:05 +01:00
@ -0,0 +83,4 @@
org.apache.hadoop.io.IOUtils.closeStream(gzipInputStream);
}
} catch (Exception e) {
throw new CollectorException(e);

you should delete the temporary file

you should delete the temporary file
claudio.atzori merged commit 5c7f7fb3b8 into beta 2024-12-06 13:51:14 +01:00
Sign in to join this conversation.
No description provided.