The Controller app of the PDF Aggregation Service.
Go to file
Lampros Smyrnaios 1111c850b9 - Add support for more than one full-text per id. Allow recognizing fileName additions: "id(1).pdf", "id(2).pdf", etc.
- Fix not giving the databaseName in the "ImpalaController.get10PublicationIdsTest()".
- Improve consistency in the "maxAttemptsPerRecord" value, among different threads. Also, reduce the value-increase by one.
- Check if the tableName string is empty, in the "mergeParquetFiles".
- Improve error-logging.
- Set some local variables to "final", optimizing code-execution by the JVM.
2022-02-07 13:57:09 +02:00
gradle/wrapper - Workaround a bug of Impala-JDBC-Driver, when creating insert-prepared-statements. 2021-12-24 00:25:50 +02:00
src/main - Add support for more than one full-text per id. Allow recognizing fileName additions: "id(1).pdf", "id(2).pdf", etc. 2022-02-07 13:57:09 +02:00
.gitignore springified project 2022-01-30 22:15:13 +02:00
Dockerfile - Allow the user to build, push and run the App in Docker, straight though the "installAndRun.sh" script. 2022-02-04 15:49:56 +02:00
README.md - Implement the "getAndUploadFullTexts" functionality. In order to access the S3-ObjectStore from one trusted place, the Controller will request the files from the workers and upload them on S3. Afterwards, the workers will delete those files from their local storage. Previously, each worker uploaded its own files. 2021-11-30 18:23:27 +02:00
build.gradle finished merge 2022-01-31 04:17:16 +02:00
installAndRun.sh - Allow the user to build, push and run the App in Docker, straight though the "installAndRun.sh" script. 2022-02-04 15:49:56 +02:00
settings.gradle - Add the "isControllerAlive"-endpoint. 2021-09-23 15:08:52 +03:00

README.md

UrlsController

This is the Controller's Application.
It receives requests coming from the workers , constructs an assignments-list with data received from a database and returns the list to the workers.
Then it receives the "WorkerReports" and writes them into the database.
The database used is the Impala .
[...]

To install and run the application, run git clone. Then, provide a file "S3_minIO_credentials.txt", inside the working directory.
In the "S3_minIO_credentials.txt" file, you should provide the endpoint, the accessKey, the secretKey, the region and the bucket, in that order, separated by comma.
Afterwards, execute the installAndRun.sh script.
If you want to just run the app, then run the script with the argument "1": ./installAndRun.sh 1.