The Controller app of the PDF Aggregation Service.
Go to file
Lampros Smyrnaios 15224c6468 Improve performance in the "getUrls"-endpoint, and more:
- Optimize the "findAssignmentsQuery" by using an inner limit (larger than the outer).
- Save a ton of time from inserting the assignments into the database, by using a temporal table to hold the new assignments, in order for them to be easily accessible both from the Controller (which processes them and send them to the Worker) and the database itself, in order to "import" them into the "assignment"-table.
- Replace the "Date" with "Timestamp", in order to hold more detailed information.
- Code cleanup.
2021-11-30 19:59:46 +02:00
gradle/wrapper - Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction. 2021-11-30 13:26:19 +02:00
scripts Initial commit of UrlsController. 2021-03-16 15:25:15 +02:00
src/main Improve performance in the "getUrls"-endpoint, and more: 2021-11-30 19:59:46 +02:00
README.md - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files. 2021-11-30 18:23:27 +02:00
build.gradle - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files. 2021-11-30 18:23:27 +02:00
installAndRun.sh - Fix a "databaseLock" bug, which could cause both the payload and attempt inserts and the "mergeParquetFiles" to fail, as the inserts could be executed concurrently with tables-compaction. 2021-11-30 13:26:19 +02:00
settings.gradle - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00

README.md

UrlsController

This is the Controller's Application.
It receives requests coming from the workers , constructs an assignments-list with data received from a database and returns the list to the workers.
Then it receives the "WorkerReports" and writes them into the database.
The database used is the Impala .
[...]

To install and run the application, run git clone. Then, provide a file "S3_minIO_credentials.txt", inside the working directory.
In the "S3_minIO_credentials.txt" file, you should provide the endpoint, the accessKey, the secretKey, the region and the bucket, in that order, separated by comma.
Afterwards, execute the installAndRun.sh script.
If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.