The Worker app of the PDF Aggregation Service.
Go to file
Lampros Smyrnaios b63ad87d00 Bug fixes:
- Fix a bug, where, in case it took too long to get the assignments from the Controller (possible when there are too many workers requesting at the same time or if the database is responding slowly), the Worker's scheduler would request for new assignments, in the meantime.
- Fix a bug, where, if the "maxAssignmentsBatchesToHandleBeforeRestart" was set, the Worker's scheduler could request another batch, right before the Worker was about to shut down.
- Fix a bug, where the condition of when to clear the over-sized data-structures was based on the "assignmentRequestCounter" send by the Controller (which is increased on each request by any worker and not for each individual one), and not on the "numHandledAssignmentsBatches" kept by each individual worker. This would result in much earlier cleanup, relative to the number of the Workers.
2022-02-19 17:09:02 +02:00
gradle/wrapper - Reduce memory-consumption in the long-run, by clearing some underlying data-structures after a threshold. 2022-02-18 20:02:34 +02:00
src Bug fixes: 2022-02-19 17:09:02 +02:00
.gitignore - Update the "installAndRun.sh" script to be able to just run the app (without re-installing), if you want. 2021-09-09 16:28:58 +03:00
README.md Update the README.md 2022-02-07 20:59:10 +02:00
build.gradle - Fix not prioritizing the gradle version defined inside the "installAndRun.sh" script. 2022-01-21 15:19:52 +02:00
installAndRun.sh - Reduce memory-consumption in the long-run, by clearing some underlying data-structures after a threshold. 2022-02-18 20:02:34 +02:00
settings.gradle - Fix the project's name inside "settings.gradle". 2021-09-22 17:06:30 +03:00

README.md

UrlsWorker

The Worker's Application, requests assignments from the Controller and processes them, downloading the available full-texts.
Then, it posts the results to the Controller, which in turn, requests from the Worker, the full-texts which are not already found by other workers, in batches.
The Worker responds by compressing and sending the requested files in each batch.

To install and run the application:

  • Run git clone and then cd UrlsWorker.
  • [Optional] Create the file inputData.txt , which contains just one line with the workerId, the maxAssignmentsLimitPerBatch, the maxAssignmentsBatchesToHandleBeforeRestart and the controller's base api-url, all seperated by a comma , . For example: worker_1,1000,0,http://IP:PORT/api/.
  • Execute the installAndRun.sh script. In case the above file (inputData.txt) does not exist, it will request the required data from the user, and then it will create the inputData.txt file.

Notes:

  • If the "maxAssignmentsBatchesToHandleBeforeRestart" is zero or negative, then an infinite number of assignments-batches will be handled.
  • The above script, installs the PublicationsRetriever, as a library and then compiles and runs the whole Application.
  • If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.