UrlsController

Commit Graph

Author	SHA1	Message	Date
Lampros Smyrnaios	64a1b7d4f0	- Comment-out the bucket-deletion process, in order to avoid any accidental deletion, even if it has to be explicitly allowed in the config. - Update dependencies.	12 hours ago
Lampros Smyrnaios	e2d43a9af0	Upgrade the "processBulkImportedFilesSegment" code: 1) Pre-calculate the file-hashes for all files of the segment and perform a single "getHashLocationsQuery", instead of thousands 2) Write some important events to the bulkImportReport file, as soon as they are added in the list.	3 weeks ago
Lampros Smyrnaios	bd323ad69a	Avoid a very rare case, where we might get an "IllegalArgumentException" from "Lists.partition()", in case the "sizeOfUrlReports" is <= 3.	4 weeks ago
Lampros Smyrnaios	08de530f03	Various improvements: - Handle the case when "fileUtils.constructS3FilenameAndUploadToS3()" returns "null", in "processBulkImportedFile()". - Avoid an "IllegalArgumentException" in "Lists.partition()" when the number of files to bulkImport are fewer than the number of threads available to handle them. - Include the last directory's "/" divider in the fileDIR group of "FILEPATH_ID_EXTENSION" regex (renamed from "FILENAME_ID_EXTENSION"). - Fix an incomplete log-message. - Provide the "fileLocation" argument in the "DocFileData" constructor, in "processBulkImportedFile()", even though it's not used after.	4 weeks ago
Lampros Smyrnaios	1d821ed803	- Prepare version for next release. - Fix typo of not using the "OpenAireID" in the S3 location of bulkImported files. Instead, the "fileNameID" was used, which in aggregation is the OpenAireID, but not in bulk-import. - Update dependencies. - Code polishing.	4 weeks ago
Lampros Smyrnaios	8bc5cc35e2	- Optimize writing to the Bulk-import-report file. - Show the IP of the worker which posts a "workerShutdownReport". - Code polishing.	1 month ago
Lampros Smyrnaios	b9b29dd51c	Move some code from "FileUtils.getAndUploadFullTexts()" to two separate methods.	1 month ago
Lampros Smyrnaios	56d233d38e	- Move the "FileUtils.mergeParquetFiles()" method to "ParquetFileUtils.mergeParquetFilesOfTable()". - Fix a typo.	1 month ago
Lampros Smyrnaios	724eae1514	- Optimize the placement of "DatabaseConnector.databaseLock.unlock()" statements. - Rename a maven-repository.	1 month ago
Lampros Smyrnaios	785204419d	Update/cleanup the repositories in "build.gradle".	1 month ago
Lampros Smyrnaios	9b0818b535	- Add handling for additional/specific exceptions, when checking the "futures". - Move common "ExecutionException" handling-code into its own method: "GenericUtils.getSelectedStackTraceForCausedException()". - Avoid a double log. - Code polishing.	1 month ago
Lampros Smyrnaios	b34417dc45	Optimize the test-DB creation process: - Use views of the "initialDatabase" view and tables to a) reduce the amount of space used by test-DBs and b) improve test-db creation performance. - Avoid possible failures from outdated metadata.	1 month ago
Lampros Smyrnaios	f61cae41a1	- Try to get the cause of the exception of the callable-tasks which handle the bulk-import of fileSegments. - Fix not counting the failedSegments when an exception was thrown. - Code polishing.	1 month ago
Lampros Smyrnaios	8f9786de09	Upgrade the algorithm for finding the previously-found fulltexts, based on their md5hash: - Use a single query with a list of the fileHashes, instead of thousands of singe-md5hash-check queries (run at most 6 in parallel) which require a lot of I/O. - Avoid checking multiple times the same fileHash, in case it is related with multiple payloads. - In case of a database-error, avoid completely losing the full-texts of that worker, instead, continue processing the full-texts.	1 month ago
Lampros Smyrnaios	e4540e7f3c	Handle the case when a urlReports-sublist does not have any payloads inside.	2 months ago
Lampros Smyrnaios	e20c5d2146	- Add error-handling for the case when no payloads could be associated with a specific url which should have been in the hashMultiMap in "addUrlReportsByMatchingRecordsFromBacklog". - Fix not cloning the payload, before changing it and adding it in the "prefilledPayloads"-list; instead, an object-reference was used.	2 months ago
Lampros Smyrnaios	1048463ca0	- Improve error-handling in "S3ObjectStore.emptyBucket()". - Change some log-levels. - Code polishing.	2 months ago
Lampros Smyrnaios	8f18008001	Avoid performing payload-related operations in case no fulltext was received from the worker, due to en error.	2 months ago
Lampros Smyrnaios	ce3e149a95	Improve the "emptying/deleting" process of the S3-bucket.	2 months ago
Lampros Smyrnaios	dd394f18a0	- Optimize the JOIN-order in the "findAssignmentsQuery". - Optimize the "DOC_URL_FILTER"-regex. - Update dependencies.	2 months ago
Lampros Smyrnaios	43ea64758d	- Improve handling of the case when no fulltexts have been found or none of the found ones were requested from the worker, as they were already retrieved in the past. - Show the number of files with problematic locations (if any of them exist). - Code polishing.	2 months ago
Lampros Smyrnaios	749172edd8	Add the Jenkins' build-status badge in README.	3 months ago
Lampros Smyrnaios	b72996c9a9	- Configure the destination of the logs in the "application.properties" file. - Add some gradle files to be used by Jenkins.	3 months ago
Lampros Smyrnaios	3563fd6e2a	- Try to get the cause of the exception of the callable-tasks which handle parquet-files. - Update License. - Update dependencies.	3 months ago
Lampros Smyrnaios	34d7a143e7	Add/improve documentation.	3 months ago
Lampros Smyrnaios	5dadb8ad2f	- Optimize the "DOC_URL_FILTER"-regex, by using a non-capturing group. - Remove an extra "File.separator" from the fulltexts-fullFilePath.	3 months ago
Lampros Smyrnaios	bdc61c2cda	When at least one worker is still active and have to wait for service-shutdown, show a log-message to inform the user, including that worker's IP.	3 months ago
Lampros Smyrnaios	3a70b57146	Prioritize the full-text urls over the landing-page ones.	3 months ago
Lampros Smyrnaios	ee1ca8966b	- Avoid continuing to request workerReport-batches when from the 1st batch, the base-directory of that assignments-counter is not found. - Update dependencies.	3 months ago
Lampros Smyrnaios	2e60128084	- Allow to easily change the por used by workers. - Show the number of active background-tasks and bulkImportDirs, which delay the Service's shutdown. - Update dependencies. - Code polishing.	4 months ago
Lampros Smyrnaios	d90ad51609	Add the "shutdownAllWorkersGracefully" and "cancelShutdownAllWorkersGracefully" endpoints, in order to be able to shut them down at once and update them, without shutting down the whole Service. So in this case the bulk-import procedures will continue to work.	5 months ago
Lampros Smyrnaios	d20c9a7d2e	- Show the original exception thrown by the background-job, not the one thrown in the main-thread, which is useless, except from its message. - Reduce the interval for deleting the unhandled assignments to once every 3 days. - Set the upcoming version. - Update dependencies.	5 months ago
Lampros Smyrnaios	7f789b8ad0	- If we receive an "UnknownHostException" when uploading to the S3ObjectStore, then skip the current full-texts' batch to leave some time for the network to get unstuck. - Code polishing.	5 months ago
Lampros Smyrnaios	9b1f2c4931	Improve performance and reduce memory usage of the "findAssignmentsQuery": - Reorder JOINs and predicates to reduce the computational cost. - Remove the memory-costly "pu.url" predicates from the "where" clause, as the DB has no empty urls anymore.	6 months ago
Lampros Smyrnaios	db929d8931	- Add a scheduling job to delete assignments older than 7 days. These may be left behind when the worker throws a "SocketTimeoutException" before it can receive the assignments and process them. No workerReport gets created for those assignments. - Improve some log-messages. - Code polishing.	6 months ago
Lampros Smyrnaios	856c62887d	- Make sure the "UTF_8" charset is used, when we get a message from the response-body. - Improve some log-messages.	6 months ago
Lampros Smyrnaios	bdf834c439	- Refactor the "shutdown" script to do an orderly-shutdown, by default, by calling the "shutdownService" endpoint. In case a "force-shutdown" is needed, that can be requested with a cmd-argument. - Fix not updating the "UrlsController.numOfWorkers" correctly. - Code polishing.	6 months ago
Lampros Smyrnaios	0c7bf6357b	- Improve performance in "FileUtils.addUrlReportsByMatchingRecordsFromBacklog()". - Make sure we remove the assignments of all "not-successful", old, worker-reports, even for the ones which failed to be renamed to indicate success or failure, or failed to be executed by the background threads (and thus never reached the renaming stage).	6 months ago
Lampros Smyrnaios	a7581335f1	- Improve the "getDataForPayloadPrefillQuery". - Improve some error-messages.	6 months ago
Lampros Smyrnaios	44c2fe7418	- Fix the "IndexOutOfBoundsException", when checking the futures' results. - Update dependencies.	6 months ago
Lampros Smyrnaios	df0ea62a5a	- Handle the case when the "webHDFSBaseUrl" does not use HTTPS. - Improve error-reporting when uploading a file to HDFS.	6 months ago
Lampros Smyrnaios	40729c6295	Move similar code into the new "ParquetFileUtils.getPayloadParquetRecord()" method.	6 months ago
Lampros Smyrnaios	f05eee7569	Improve the names of some methods.	6 months ago
Lampros Smyrnaios	def21b991d	Improve the UX of the "installAndRun.sh" script.	7 months ago
Lampros Smyrnaios	fb2877dbe8	Upgrade the execution system for the backgroundTasks: - Submit each task immediately for execution, instead of waiting for a scheduling thread to send all gathered tasks (up to that point) to the ExecutorService (and block until they are finished, before it can start again). - Hold the Future of each submitted task to a synchronized-list to check the result of each task at a scheduled time. - Reduce the cpu-time to assure the Service can shut down, by checking if there are "actively" and "about-to-be-executed" tasks, at the same time. Instead of having to rely on the additional checking of the "shutdown"-status of each worker to verify that no active task exist. - Improve the threads' shutdown procedure.	7 months ago
Lampros Smyrnaios	a354da763d	- Improve some log-messages. - Increase app's version. - Code polishing.	7 months ago
Lampros Smyrnaios	0c79fdea35	Update the "findAssignmentsQuery" to check the "attempt.error_class" field for the current pub_url, not the pub_id.	7 months ago
Lampros Smyrnaios	ebf8896005	- Fix getter and setter methods for the "isAuthoritative" field. - Update Gradle.	7 months ago
Lampros Smyrnaios	b2ce6393c1	- Add check for remaining "bulkImportDirsUnderProcessing", before shutting down the Service. - Code polishing.	7 months ago
Lampros Smyrnaios	96c11ba4b8	- Add a missing change. - Code optimization and polishing. - Update dependencies.	7 months ago

1 2 3 4 5 ...

278 Commits (master) All Branches Search

278 Commits (master)

All Branches