The Worker app of the PDF Aggregation Service.
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
Lampros Smyrnaios 0db35a83e7 - Reduce memory consumption and fix a potential issue, where many "already-retrieved" full-texts would be already deleted (or in different directories, for a short time), as they belonged to a previous assignments-batch (this case is now possible, after the following fix).
- Fix a bug, causing a missing character in the "alreadyDownloaded" full-text fileName, which in turn caused the file-data of that record to not get updated with the file-data of the record for which the same file was initially downloaded for.
2 years ago
gradle/wrapper - Optimize the "FileZipper.zipMultipleFilesAndGetZip()" and "FileZipper.zipAFile()" methods. 2 years ago
scripts Initial commit of UrlsWorker. 3 years ago
src - Reduce memory consumption and fix a potential issue, where many "already-retrieved" full-texts would be already deleted (or in different directories, for a short time), as they belonged to a previous assignments-batch (this case is now possible, after the following fix). 2 years ago
.gitignore - Update the "installAndRun.sh" script to be able to just run the app (without re-installing), if you want. 3 years ago
README.md - Update the "installAndRun.sh": 3 years ago
build.gradle - Allow the user to set the "maxAssignmentsLimitPerBatch" value. 2 years ago
installAndRun.sh - Allow the user to set the "maxAssignmentsLimitPerBatch" value. 2 years ago
settings.gradle - Fix the project's name inside "settings.gradle". 3 years ago

README.md

UrlsWorker

This is the Worker's Application.
It requests assignments from the controller and processes them.
It posts the results to the controller, which in turn, puts them in a database.

To install and run the application:

  • Run git clone and then cd UrlsWorker.
  • Create the file S3_minIO_credentials.txt , which contains just one line with the S3_url, S3_username, S3_password, S3_server_region and the S3_bucket, separated by a comma ,.
  • [Optional] Create the file inputData.txt , which contains just one line with the workerId and the controller's base api-url, seperated by a comma , . For example: worker_1,http://IP:PORT/api/.
  • Execute the installAndRun.sh script. In case the above file (inputData.txt) does not exist, it will request the current worker's ID and the Controller's Url, and it will create the inputData.txt file.

That script, installs the PublicationsRetriever, as a library and then compiles and runs the whole Application.
If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.