UrlsController/README.md

24 lines
1.7 KiB
Markdown
Raw Normal View History

2021-03-16 14:25:15 +01:00
# UrlsController
2022-02-07 20:11:03 +01:00
The Controller's Application receives requests coming from the [Workers](https://code-repo.d4science.org/lsmyrnaios/UrlsWorker) , constructs an assignments-list with data received from a database and returns the list to the workers.<br>
Then, it receives the "WorkerReports", it requests the full-texts from the workers, in batches, and uploads them on the S3-Object-Store. Finally, it writes the related reports, along with the updated file-locations into the database.<br>
The database used is the [Impala](https://impala.apache.org/).<br>
<br>
Statistics API:
- "**getNumberOfPayloads**" endpoint: **http://IP:PORT/api/stats/getNumberOfPayloads**
- "**getNumberOfRecordsInspected**" endpoint: **http://IP:PORT/api/stats/getNumberOfRecordsInspected**
<br>
<br>
2022-02-07 20:11:03 +01:00
To install and run the application:
- Run ```git clone``` and then ```cd UrlsController```.
- Provide the **S3 Object Store** related configurations, inside the *src/main/resources/application.properties* file.<br>
- Execute the ```installAndRun.sh``` script which builds and runs the app.<br>
If you want to just run the app, then run the script with the argument "1": ```./installAndRun.sh 1```.<br>
If you want to build and run the app on a **Docker Container**, then run the script with the argument "0" followed by the argument "1": ```./installAndRun.sh 0 1```.<br>
<br>
Implementation notes:
- For transferring the full-text files, we use Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits in compression rate and speed.
- The names of the uploaded full-text files ae of the following form: "***datasourceID/recordId::fileHash.pdf***"