forked from lsmyrnaios/UrlsController
Add/improve documentation.
This commit is contained in:
parent
5dadb8ad2f
commit
34d7a143e7
28
README.md
28
README.md
|
@ -8,6 +8,18 @@ It can also process **Bulk-Import** requests, from compatible data sources, in w
|
|||
For interacting with the database we use [**Impala**](https://impala.apache.org/).<br>
|
||||
<br>
|
||||
|
||||
|
||||
**To install and run the application**:
|
||||
- Run ```git clone``` and then ```cd UrlsController```.
|
||||
- Set the preferable values inside the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file. Specifically, for tests, set the ***services.pdfaggregation.controller.isTestEnvironment*** property to "**true**" and make sure the "***services.pdfaggregation.controller.db.testDatabaseName***" property is set to a test-database.
|
||||
- Execute the ```installAndRun.sh``` script which builds and runs the app.<br>
|
||||
If you want to just run the app, then run the script with the argument "1": ```./installAndRun.sh 1```.<br>
|
||||
If you want to build and run the app on a **Docker Container**, then run the script with the argument "0" followed by the argument "1": ```./installAndRun.sh 0 1```.<br>
|
||||
Additionally, if you want to test/visualize the exposed metrics on Prometheus and Grafana, you can deploy their instances on docker containers,
|
||||
by enabling the "runPrometheusAndGrafanaContainers" switch, inside the "./installAndRun.sh" script.<br>
|
||||
<br>
|
||||
|
||||
|
||||
**BulkImport API**:
|
||||
- "**bulkImportFullTexts**" endpoint: **http://\<IP\>:\<PORT\>/api/bulkImportFullTexts?provenance=\<provenance\>&bulkImportDir=\<bulkImportDir\>&shouldDeleteFilesOnFinish={true|false}** <br>
|
||||
This endpoint loads the right configuration with the help of the "provenance" parameter, delegates the processing to a background thread and immediately returns a message with useful information, including the "reportFileID", which can be used at any moment to request a report about the progress of the bulk-import procedure.<br>
|
||||
|
@ -15,6 +27,12 @@ For interacting with the database we use [**Impala**](https://impala.apache.org/
|
|||
- "**getBulkImportReport**" endpoint: **http://\<IP\>:\<PORT\>/api/getBulkImportReport?id=\<reportFileID\>** <br>
|
||||
This endpoint returns the bulkImport report, which corresponds to the given reportFileID, in JSON format.
|
||||
<br>
|
||||
|
||||
**How to add a bulk-import datasource**:
|
||||
- Open the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file.
|
||||
- Add a new object under the "bulk-import.bulkImportSources" property.
|
||||
- Read the comments written in the end of the "bulk-import" property and make sure all requirements are met.
|
||||
<br>
|
||||
<br>
|
||||
|
||||
**Statistics API**:
|
||||
|
@ -60,16 +78,6 @@ Note: The Shutdown Service API is accessible by the Controller's host machine.
|
|||
<br>
|
||||
<br>
|
||||
|
||||
**To install and run the application**:
|
||||
- Run ```git clone``` and then ```cd UrlsController```.
|
||||
- Set the preferable values inside the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file.
|
||||
- Execute the ```installAndRun.sh``` script which builds and runs the app.<br>
|
||||
If you want to just run the app, then run the script with the argument "1": ```./installAndRun.sh 1```.<br>
|
||||
If you want to build and run the app on a **Docker Container**, then run the script with the argument "0" followed by the argument "1": ```./installAndRun.sh 0 1```.<br>
|
||||
Additionally, if you want to test/visualize the exposed metrics on Prometheus and Grafana, you can deploy their instances on docker containers,
|
||||
by enabling the "runPrometheusAndGrafanaContainers" switch, inside the "./installAndRun.sh" script.<br>
|
||||
<br>
|
||||
|
||||
Implementation notes:
|
||||
- For transferring the full-text files, we use Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits in compression rate and speed.
|
||||
- The uploaded full-text files follow this naming-scheme: "**datasourceID/recordID::fileHash.pdf**"
|
||||
|
|
|
@ -509,7 +509,7 @@ public class FileUtils {
|
|||
{
|
||||
// Iterate over the files and upload them to S3.
|
||||
//int numUploadedFiles = 0;
|
||||
for( String fileName : fileNames )
|
||||
for ( String fileName : fileNames )
|
||||
{
|
||||
if ( fileName.contains(".tar") ) // Exclude the tar-files from uploading (".tar" and ".tar.zstd").
|
||||
continue;
|
||||
|
|
|
@ -58,7 +58,7 @@ bulk-import:
|
|||
# For "authoritative" sources, a special prefix is selected, from: https://graph.openaire.eu/docs/data-model/pids-and-identifiers/#identifiers-in-the-graph
|
||||
# For the rest, the "datasource_prefix" is selected, using this query:
|
||||
# select datasource.namespaceprefix.value
|
||||
# from openaire_prod_20230414.datasource -- Here use the latest production-table.
|
||||
# from openaire_prod_<PROD_DATE>.datasource -- Here use the production-table with the latest date.
|
||||
# where officialname.value = 'datasourceOfficialName';
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue