The Worker app of the PDF Aggregation Service.
Go to file
Lampros Smyrnaios 212f8f377d - Set the "ConnSupportUtils.shouldBlockMost5XXDomains" to "false" and call the "LoaderAndChecker.setCouldRetryRegex()" method. The above, make sure that for HTTP-5XX-errors, only the 511-domains get blocked and only the 511-urls get labeled with "noRetry".
- Improve performance and reduce memory consumption, by calling the "ConnSupportUtils.setKnownMimeTypes()" method only once, in the constructor-method.
- Code cleanup.
2021-11-30 06:57:51 +02:00
gradle/wrapper - Increase the "PublicationsRetriever.threadsMultiplier" to "6", as the threads are mostly network-blocked. 2021-11-30 01:02:06 +02:00
scripts Initial commit of UrlsWorker. 2021-03-16 18:38:53 +02:00
src - Set the "ConnSupportUtils.shouldBlockMost5XXDomains" to "false" and call the "LoaderAndChecker.setCouldRetryRegex()" method. The above, make sure that for HTTP-5XX-errors, only the 511-domains get blocked and only the 511-urls get labeled with "noRetry". 2021-11-30 06:57:51 +02:00
.gitignore - Update the "installAndRun.sh" script to be able to just run the app (without re-installing), if you want. 2021-09-09 16:28:58 +03:00
README.md - Update the "installAndRun.sh": 2021-09-22 16:36:48 +03:00
build.gradle - Increase the "PublicationsRetriever.threadsMultiplier" to "6", as the threads are mostly network-blocked. 2021-11-30 01:02:06 +02:00
installAndRun.sh - Set the "ConnSupportUtils.shouldBlockMost5XXDomains" to "false" and call the "LoaderAndChecker.setCouldRetryRegex()" method. The above, make sure that for HTTP-5XX-errors, only the 511-domains get blocked and only the 511-urls get labeled with "noRetry". 2021-11-30 06:57:51 +02:00
settings.gradle - Fix the project's name inside "settings.gradle". 2021-09-22 17:06:30 +03:00

README.md

UrlsWorker

This is the Worker's Application.
It requests assignments from the controller and processes them.
It posts the results to the controller, which in turn, puts them in a database.

To install and run the application:

  • Run git clone and then cd UrlsWorker.
  • Create the file S3_minIO_credentials.txt , which contains just one line with the S3_url, S3_username, S3_password, S3_server_region and the S3_bucket, separated by a comma ,.
  • [Optional] Create the file inputData.txt , which contains just one line with the workerId and the controller's base api-url, seperated by a comma , . For example: worker_1,http://IP:PORT/api/.
  • Execute the installAndRun.sh script. In case the above file (inputData.txt) does not exist, it will request the current worker's ID and the Controller's Url, and it will create the inputData.txt file.

That script, installs the PublicationsRetriever, as a library and then compiles and runs the whole Application.
If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.