Add/improve documentation.

2024-02-01 14:37:29 +02:00 · 2024-02-01 14:37:29 +02:00 · 34d7a143e7
parent 5dadb8ad2f
commit 34d7a143e7
3 changed files with 20 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -8,6 +8,18 @@ It can also process **Bulk-Import** requests, from compatible data sources, in w
 For interacting with the database we use [**Impala**](https://impala.apache.org/).<br>
 <br>

+
+**To install and run the application**:
+- Run ```git clone``` and then ```cd UrlsController```.
+- Set the preferable values inside the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file. Specifically, for tests, set the ***services.pdfaggregation.controller.isTestEnvironment*** property to "**true**" and make sure the "***services.pdfaggregation.controller.db.testDatabaseName***" property is set to a test-database.
+- Execute the ```installAndRun.sh``` script which builds and runs the app.<br>
+  If you want to just run the app, then run the script with the argument "1": ```./installAndRun.sh 1```.<br>
+  If you want to build and run the app on a **Docker Container**, then run the script with the argument "0" followed by the argument "1": ```./installAndRun.sh 0 1```.<br>
+  Additionally, if you want to test/visualize the exposed metrics on Prometheus and Grafana, you can deploy their instances on docker containers,
+  by enabling the "runPrometheusAndGrafanaContainers" switch, inside the "./installAndRun.sh" script.<br>
+  <br>
+
+
 **BulkImport API**:
 - "**bulkImportFullTexts**" endpoint: **http://\<IP\>:\<PORT\>/api/bulkImportFullTexts?provenance=\<provenance\>&bulkImportDir=\<bulkImportDir\>&shouldDeleteFilesOnFinish={true|false}** <br>
  This endpoint loads the right configuration with the help of the "provenance" parameter, delegates the processing to a background thread and immediately returns a message with useful information, including the "reportFileID", which can be used at any moment to request a report about the progress of the bulk-import procedure.<br>
@ -15,6 +27,12 @@ For interacting with the database we use [**Impala**](https://impala.apache.org/
 - "**getBulkImportReport**" endpoint: **http://\<IP\>:\<PORT\>/api/getBulkImportReport?id=\<reportFileID\>** <br>
  This endpoint returns the bulkImport report, which corresponds to the given reportFileID, in JSON format.
 <br>
+
+**How to add a bulk-import datasource**:
+- Open the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file.
+- Add a new object under the "bulk-import.bulkImportSources" property.
+- Read the comments written in the end of the "bulk-import" property and make sure all requirements are met. 
+<br>
 <br>

 **Statistics API**:
@ -60,16 +78,6 @@ Note: The Shutdown Service API is accessible by the Controller's host machine.
 <br>
 <br>

-**To install and run the application**:
- Run ```git clone``` and then ```cd UrlsController```.
- Set the preferable values inside the [__application.yml__](https://code-repo.d4science.org/lsmyrnaios/UrlsController/src/branch/master/src/main/resources/application.yml) file.
- Execute the ```installAndRun.sh``` script which builds and runs the app.<br>
-If you want to just run the app, then run the script with the argument "1": ```./installAndRun.sh 1```.<br>
-If you want to build and run the app on a **Docker Container**, then run the script with the argument "0" followed by the argument "1": ```./installAndRun.sh 0 1```.<br>
-Additionally, if you want to test/visualize the exposed metrics on Prometheus and Grafana, you can deploy their instances on docker containers, 
-by enabling the "runPrometheusAndGrafanaContainers" switch, inside the "./installAndRun.sh" script.<br>
-<br>
-
 Implementation notes:
 - For transferring the full-text files, we use Facebook's [**Zstandard**](https://facebook.github.io/zstd/) compression algorithm, which brings very big benefits in compression rate and speed.
 - The uploaded full-text files follow this naming-scheme: "**datasourceID/recordID::fileHash.pdf**"
--- a/src/main/java/eu/openaire/urls_controller/util/FileUtils.java
+++ b/src/main/java/eu/openaire/urls_controller/util/FileUtils.java
@ -509,7 +509,7 @@ public class FileUtils {
    {
        // Iterate over the files and upload them to S3.
        //int numUploadedFiles = 0;
-        for( String fileName : fileNames )
+        for ( String fileName : fileNames )
        {
            if ( fileName.contains(".tar") ) // Exclude the tar-files from uploading (".tar" and ".tar.zstd").
                continue;
--- a/src/main/resources/application.yml
+++ b/src/main/resources/application.yml
@ -58,7 +58,7 @@ bulk-import:
 # For "authoritative" sources, a special prefix is selected, from: https://graph.openaire.eu/docs/data-model/pids-and-identifiers/#identifiers-in-the-graph
 # For the rest, the "datasource_prefix" is selected, using this query:
 #   select datasource.namespaceprefix.value
-#   from openaire_prod_20230414.datasource      -- Here use the latest production-table.
+#   from openaire_prod_<PROD_DATE>.datasource      -- Here use the production-table with the latest date.
 #   where officialname.value = 'datasourceOfficialName';