The Worker app of the PDF Aggregation Service.
Go to file
Lampros Smyrnaios d37cd738a0 Refactor the full-texts deletion process to reduce storage space and complexity:
- Delete the assignments-batch full-texts after the whole procedure (for each assignments-batch) is finished, either successfully or not.
- Do not check for remaining files, when the Worker shuts down, since, in case of problematic handling the files are deleted anyway.

The full-texts are not needed to be kept, in case of an error, since the Controller will reassign the non-downloaded id-url records to some worker (maybe different) and these files will be downloaded again and handled there.

Also, change the "assignmentsNumsHandled" to hold data only for assignments which are handled all the way, including the upload of the full-texts from the Controller and also the insertion of the WorkerReport to the database.
2022-12-07 12:29:05 +02:00
gradle/wrapper - Set some optimization settings for gradle. 2022-11-30 16:25:57 +02:00
src Refactor the full-texts deletion process to reduce storage space and complexity: 2022-12-07 12:29:05 +02:00
.gitignore - Update the "installAndRun.sh" script to be able to just run the app (without re-installing), if you want. 2021-09-09 16:28:58 +03:00
README.md - When the Worker is about to shut-down, after deleting all the handled assignments' files, check for remaining full-texts in the local storage and warn the user. If no remaining files were found, then delete the parent fulltexts' directory. 2022-11-02 02:27:04 +02:00
build.gradle - Set some optimization settings for gradle. 2022-11-30 16:25:57 +02:00
gradle.properties - Set some optimization settings for gradle. 2022-11-30 16:25:57 +02:00
installAndRun.sh - Set some optimization settings for gradle. 2022-11-30 16:25:57 +02:00
settings.gradle - Fix the project's name inside "settings.gradle". 2021-09-22 17:06:30 +03:00

README.md

UrlsWorker

The Worker's Application, requests assignments from the Controller and processes them, downloading the available full-texts.
Then, it posts the results to the Controller, which in turn, requests from the Worker, the full-texts which are not already found by other workers, in batches.
The Worker responds by compressing and sending the requested files, in each batch.

To install and run the application:

  • Run git clone and then cd UrlsWorker.
  • [Optional] Create the file inputData.txt , which contains just one line with the workerId, the maxAssignmentsLimitPerBatch, the maxAssignmentsBatchesToHandleBeforeRestart, the controller's base api-url and the shutdownOrCancelCode, all seperated by a comma "," .
    For example: worker_1,1000,0,http://IP:PORT/api/,stopOrCancelCode.
    The shutdownOrCancelCode is kind of an "auth-code", when receiving "shutdown" and "cancel-shutdown" requests.
  • Execute the installAndRun.sh script.
    In case the above file (inputData.txt) does not exist, the script will request the required data from the user, and then it will create the inputData.txt file.

Notes:

  • If the "maxAssignmentsBatchesToHandleBeforeRestart" is zero or negative, then an infinite number of assignments-batches will be handled.
  • The above script, installs the PublicationsRetriever, as a library and then compiles and runs the whole Application.
  • If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.