Commit Graph

49 Commits

Author SHA1 Message Date
Lampros Smyrnaios 2e4c1323a3 - Add check for null or empty id or url.
- Code cleanup.
2022-01-28 03:47:46 +02:00
Lampros Smyrnaios a428b1d1e6 - Fix not prioritizing the gradle version defined inside the "installAndRun.sh" script.
- Update SpringBoot dependency.
2022-01-21 15:19:52 +02:00
Lampros Smyrnaios 8912bb1cf9 Fix adding an invalid error-message in case of an "alreadyDownloaded" full-text being discovered inside the "FileUtils.dataToBeLoggedList". 2022-01-17 23:46:15 +02:00
Lampros Smyrnaios 0032a8018f - Improve search-accuracy of "alreadyDownloaded" full-texts.
- Handle the potential error-case of an "alreadyDownloaded" full-text not being discovered inside the "FileUtils.dataToBeLoggedList".
2022-01-17 10:12:48 +02:00
Lampros Smyrnaios d61ff4b6dd Integrate some changes from the "PublicationsRetrieverPlugin". 2022-01-14 15:13:00 +02:00
Lampros Smyrnaios 8abb260d60 - In case of an unknown (non-documented) exception inside "LoaderAndChecker.invokeAllTasksAndWait", now it will be logged and the app will gently shut down with an error-message in the Error-stream.
- Avoid double-checking for handledAssignments -in order to delete their full-texts- when the app is about to shut down, in case the "maxAssignmentsBatchesToHandleBeforeRestart" is set above Zero.
2022-01-04 00:23:45 +02:00
Lampros Smyrnaios 92d011e8a0 - Make sure the handled assignments - full-texts are deleted before the application exits.
- When the user sets the "maxAssignmentsBatchesToHandleBeforeRestart" above zero, shutdown immediately after the last assignments-batch. Do not wait for the next scheduled check.
- Allow the user to set the "maxAssignmentsBatchesToHandleBeforeRestart" in the "installAndRun.sh" script.
- Increase the "fixedRate" for the "ScheduledTasks.deleteHandledAssignmentsFullTexts()" method to 12 hours.
- Update README.md
2021-12-31 04:09:05 +02:00
Lampros Smyrnaios 1ddfd34236 - Allow the user to set a maximum number of assignments-batches for the Worker to handle. After handling those batches, the Worker will shut down. A number of < 0 > indicates an infinite number of batches.
- Avoid converting the zero fileSize to < null >. Now, the default value is < null >, so the zero-value will indicate a zero-byte file.
- Update dependencies.
- Code cleanup.
2021-12-24 00:12:34 +02:00
Lampros Smyrnaios a8e2ddcf54 - Reduce the "PublicationsRetriever.threadsMultiplier" to 4.
- Eliminate a possible NPE.
2021-12-20 22:25:27 +02:00
Lampros Smyrnaios c46c8c448a - Upgrade the zip-file delivery by using the "InputStreamResource". This way is more reliable, have better performance and uses less memory.
- Use the "InputStreamResource" also in "get(single)FullText"-endpoint, in order to avoid loading a big full-text file in memory.
- Decrease the system-reserved memory by 128 MB.
- Fix path-variable regexes for "getFullText"-endpoint.
- Optimize imports.
- Code cleanup.
2021-12-17 08:25:54 +02:00
Lampros Smyrnaios 4fb5becace - Increase the system-reserved memory, in "installAndRun.sh".
- Fix not closing the zip-entry in case of an error.
2021-12-17 00:26:47 +02:00
Lampros Smyrnaios 82d69f3bf5 - Calculate and set the max heap size with respect to the system resources, in "installAndRun.sh".
- Fix not setting the right "Error"-members when the docUrl was found, but the full-text was not retrieved.
- Set a "couldRetry"-indication in the "Error"-class, when the full-text was retrieved, as, in general, it could be retried to give the same successful result.
- Update the "docFileNotRetrieved"-check to use the standardized string.
- Eliminate some possible NPEs.
- Update Gradle.
2021-12-16 02:04:05 +02:00
Lampros Smyrnaios 0db35a83e7 - Reduce memory consumption and fix a potential issue, where many "already-retrieved" full-texts would be already deleted (or in different directories, for a short time), as they belonged to a previous assignments-batch (this case is now possible, after the following fix).
- Fix a bug, causing a missing character in the "alreadyDownloaded" full-text fileName, which in turn caused the file-data of that record to not get updated with the file-data of the record for which the same file was initially downloaded for.
2021-12-13 21:16:30 +02:00
Lampros Smyrnaios 859f850f56 - Improve Zip-file delivery and significantly decrease memory consumption, by streaming it, instead of loading the whole file in memory before sending it to the Controller.
- Fix not closing the inputStream of the zip-file.
- Count and show the number of files which were zipped in each batch.
2021-12-13 15:29:03 +02:00
Lampros Smyrnaios ab5e04698c - Fix a bug, causing the "fileLocation" to be set to the value of the "errorCause", when the docFile was not retrieved.
- Leave the "fileLocation" to be NULL, when the DocFile was not retrieved. Previously, the value "File not retrieved" was assigned (only in theory, because the bug above caused the related check to always fail).
- Verify that the "ControllerBaseUrl" given by the user is not malformed.
2021-12-07 19:33:10 +02:00
Lampros Smyrnaios fd5b56e3c6 - Allow the user to set the "maxAssignmentsLimitPerBatch" value.
- Set increased lower and upper limits for the Java Heap Size.
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
- Improve two log-messages.
2021-12-07 00:52:40 +02:00
Lampros Smyrnaios ce49bff50e - Reduce memory consumption when loading a zipFile.
- Check whether the "zipBatchCounter" is larger than the "totalZipBatches".
- Improve the "failed tasks" log-message.
2021-12-03 16:29:16 +02:00
Lampros Smyrnaios 018326eedd - Optimize the "FileZipper.zipMultipleFilesAndGetZip()" and "FileZipper.zipAFile()" methods.
- Improve the "getMultipleFullTexts"-endpoint. Check if the "fileNamesWithExtensions"-list is empty. Check if the baseDir for the fullTexts of a given assignments-counter is missing.
- Optimize the "PublicationsRetrieverPlugin.processAssignments()" method.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the heap size, in the beginning.
- Update Gradle.
- Code cleanup.
2021-12-03 04:09:40 +02:00
Lampros Smyrnaios 212f8f377d - Set the "ConnSupportUtils.shouldBlockMost5XXDomains" to "false" and call the "LoaderAndChecker.setCouldRetryRegex()" method. The above, make sure that for HTTP-5XX-errors, only the 511-domains get blocked and only the 511-urls get labeled with "noRetry".
- Improve performance and reduce memory consumption, by calling the "ConnSupportUtils.setKnownMimeTypes()" method only once, in the constructor-method.
- Code cleanup.
2021-11-30 06:57:51 +02:00
Lampros Smyrnaios 6355b3e397 - Increase the "PublicationsRetriever.threadsMultiplier" to "6", as the threads are mostly network-blocked.
- Make sure the "maven" package is installed before compiling the "PublicationsRetriever" library.
- Update dependencies.
2021-11-30 01:02:06 +02:00
Lampros Smyrnaios 045788c728 - Use the "Timestamp" data-type instead of the "Date", in order to include more information.
- Code cleanup.
2021-11-27 02:37:33 +02:00
Lampros Smyrnaios 20b71164d5 - The worker will store the files in its local file-system and will send them to the controller in batches, after the latter requests them. When all files from a given assignments-num are sent, the files will be deleted from the Worker, in a scheduled-job.
- Implement the "getFullTexts"-endpoint, which returns the requested full-texts in a zip file.
- Implement the "getFullText"-endpoint, which returns the requested full-text.
- Implement the "getHandledAssignmentsCounts"-endpoint which returns the assignments-numbers, which were handled by that worker.
- Make sure each urlReport has the same "Date" for a given assignments-number. Also, make sure the "size" and "hash" have a "null" value, in case the full-text was not found.
- Check and log thread-pool shutdown errors.
- Add the stack-trace in the error-logs, instead of the Stderr.
- Update SpringBoot dependency.
- Change log levels.
- Code cleanup.
2021-11-26 17:04:31 +02:00
Lampros Smyrnaios 3220c97373 - Improve performance when requesting, processing and posting requests.
- Fix a bug, causing degraded performance when processing more than 3000 assignments.
- Fix the progress percentage shown in the logs.
- Avoid a potential NPE when processing a broken "Assignment" object.
- Update Spring to v.2.5.6.
- Code cleanup.
2021-10-30 17:14:18 +03:00
Lampros Smyrnaios 0f12a9305c - Decrease the time interval for the scheduled task "handleNewAssignments". This helps to reduce the "dead-time" between reporting the current assignments and requesting the new ones.
- Avoid a potential NPE when giving information about the received AssignmentRequest.
- Log and return, when the received assignments-list is empty.
- Improve some logging-messages.
- Update the logs' fileName and change the preferred appender to "File".
- Code cleanup.
2021-10-14 03:03:47 +03:00
Lampros Smyrnaios 380137fbff - Add an HTTP-error-handler in "AssignmentHandler.requestAssignments()".
- Increase the "requestConnectTimeoutDuration" and the "requestReadTimeoutDuration".
- Increase project's version to "1.0.0-SNAPSHOT".
- Update dependencies.
- Code cleanup.
2021-10-11 13:27:40 +03:00
Lampros Smyrnaios 42f8cb769d Update "installAndRun.sh": check if a gradle installation with the given version already exists, before downloading and installing gradle. 2021-10-11 11:19:52 +03:00
Lampros Smyrnaios 5386035397 - Add timeDurationLimits to wait for the requested Assignments to come from the Controller.
- Make sure that the test-Results do not get posted to the Controller and written to the database.
- Improve error-handling in "AssignmentHandler.requestAssignments()".
2021-09-23 16:23:49 +03:00
Lampros Smyrnaios e091a029a8 - Fix the project's name inside "settings.gradle".
- Fix the "change-dir" to the "libs"-directory in "installAndRun.sh"
2021-09-22 17:06:30 +03:00
Lampros Smyrnaios 2ffb44a615 - Update the "installAndRun.sh":
--Ask the user to give the "workerId" and the "controllerBaseUrl".
--Make sure the "libs" directory is created, if not exists.
--Make sure the "unzip" package is installed.
- Change the data-type of the "UrlReport.status" to be "enum StatusType", in order to increase consistency and comparability.
- Update the guidelines in the README.
2021-09-22 16:36:48 +03:00
Lampros Smyrnaios 61597d1627 - Read the Controller's url from a file, when starting the Application.
- Switch the "AssignmentsHandler.askForTest" to "false".
- Get the size and the hash of a docFile which is previously downloaded by another ID in that batch.
- Reset the "AssignmentHandler.urlReports" list after posting the results to the Controller.
- Enhance logging and comments.
- Add more guidelines in the README.
- Disable the scheduled test-live job.
- Code cleanup.
2021-09-21 16:21:39 +03:00
Lampros Smyrnaios 32aff8c44a - Update the "installAndRun.sh" script to be able to just run the app (without re-installing), if you want.
- Fix a missing "mimeType"-assignment.
- Add gitignore.
2021-09-09 16:28:58 +03:00
Lampros Smyrnaios b2788d31a9 - Integrate the latest changes from the "PublicationsRetriever"-plugin. The fileSize and the fileHash are computed inside the plugin now.
- Make the "mimeType" "null", when no docFile was retrieved.
- Signal the scheduler that the worker is ready for work, when it has finished processing but not yet posted the previous data.
- Fix a minor bug; now return "false" when there is any problem with the url of a specific task.
- Avoid memory re-allocations for "callableTasks".
2021-09-08 05:02:14 +03:00
Lampros Smyrnaios 6fd9eed1ec - Eliminate some warnings, by excluding an inner dependency.
- Comment-out some debugging gradle commands.
2021-09-02 18:35:47 +03:00
Lampros Smyrnaios b6d66653f7 - Integrate the latest changes from the "PublicationsRetriever"-plugin.
- Update dependencies.
2021-09-01 19:42:32 +03:00
Lampros Smyrnaios 5bbf422d3b - Add README.md
- Avoid sending a null-Error-object, which increases the complexity on the controller's side. Instead, send an Error object with "null" members.
2021-08-05 20:41:32 +03:00
Lampros Smyrnaios 62ce7ee4a5 - Process the Error of PDF-aggregation. Distinguish between "couldRetry" and "noRetry" cases.
- Add a "test"-switch in order to easily switch between test and normal mode.
- Fix an NPE, when requesting for the "AssignmentRequest".
- Upgrade the "installPublicationsRetriever.sh" to "installAndRun.sh", which takes care of everything.
- Define the newest SpringBoot-version in "build.gradle".
- Code cleanup.
2021-08-05 15:09:28 +03:00
Lampros Smyrnaios 6cc2673fca - Add the ability to upload the files on an S3-ObjectStore.
- Change the server's port and the port of the controller-api.
- Update dependencies.
2021-07-29 09:01:53 +03:00
Lampros Smyrnaios 6307cda23a - Refactor the assignments-handling. In order to match with the database schema, now the AssignmentRequest returns a list of Assignments instead of a single assignment having a list of Tasks.
- Cleanup the members of the "Payload" model.
2021-07-05 15:00:29 +03:00
Lampros Smyrnaios f6e53ca289 Integrate the "PublicationsRetriever" program as a plugin, which downloads the full-texts of the publications. Afterwards, the retrieved data info is transferred to the Controller.
The "PublicationsRetriever" can be installed locally as a library, using the "installPublicationsRetriever.sh" script.
2021-06-22 05:58:07 +03:00
Lampros Smyrnaios 83d1bd2def Update the "WorkerReport" response and the "UrlReport" and "Payload" models. 2021-06-19 07:12:24 +03:00
Lampros Smyrnaios 3550ed71d9 Execute the "AssignmentHandler.handleAssignment()", only from the scheduler, as it starts automatically when the program starts. 2021-06-11 13:44:33 +03:00
Lampros Smyrnaios 5f3409e072 - Update Spring and add the "gradle-wrapper.properties" file which defines the gradle version.
- Improve an info-logging message and cleanup the code.
2021-06-10 14:29:20 +03:00
Lampros Smyrnaios 53ccea869a Add an "assignmentId" field in the "Assignment"-class. 2021-06-09 05:45:07 +03:00
Lampros Smyrnaios 82e12655e7 - Add an "AssignmentHandler", which retrieves the assignment from the controller and categorises the tasks using their datasource. In the future, it will execute the tasks of the assignment, using different plugins. It runs upon the Application start and also every 30 mins (if no other job is in execution).
- Add the "isWorkerAvailableForWork-endpoint.
2021-05-20 03:28:48 +03:00
Lampros Smyrnaios a4c97dffbf - Add the "Datasource" class in the "Task" class and include it in the Assignment that the worker retrieves.
- Update dependencies.
2021-05-20 02:58:08 +03:00
Lampros Smyrnaios 137744a8ce - Add classes: "Assignment", "AssignmentRequest", "UrlReport", "WorkerReport" and "WorkerResponse".
- Add interface "WorkerConstants".
2021-04-24 21:10:35 +03:00
Lampros Smyrnaios b2dfd524e1 Update "Task" and "Error" classes. 2021-04-24 21:08:02 +03:00
Lampros Smyrnaios 6a87d3e478 Add "Task" and "Error" classes. 2021-04-15 03:37:54 +03:00
Lampros Smyrnaios 08eabe6f08 Initial commit of UrlsWorker. 2021-03-16 18:38:53 +02:00