Commit Graph

4520 Commits

Author SHA1 Message Date
Serafeim Chatzopoulos db03f85366 Remove steps for updating BIP! from the impact indicators workflow 2024-09-04 14:25:44 +03:00
Miriam Baglioni 468f2aa5a5 [AffiliationAffRo]align beta with new affiliation from publisher webpage introduced in production. AffRo collectedfrom OpenAIRE to discriminate against WebCrawl 2024-08-12 18:10:46 +02:00
Miriam Baglioni 89fcf4086c [Person]fix issue in affiliation relation id construction for person (missing ::) 2024-08-12 18:04:43 +02:00
Miriam Baglioni 8c185a7b1a resolving conflicts 2024-08-05 17:14:11 +02:00
Claudio Atzori 8e7ef79ce0 [bip affiliations] considers only DOI based records 2024-08-05 12:13:48 +02:00
Miriam Baglioni 985ca15264 [openaire-affiliation]removes matchings without DOI 2024-08-05 12:10:40 +02:00
Claudio Atzori fecbf93e0e Merge pull request 'FoS L1 & L2' (#465) from fos_l1l2 into beta
Reviewed-on: #465
2024-08-01 13:58:04 +02:00
Claudio Atzori 64740475d0 depending on dhp-schemas:7.0.1 2024-07-29 11:51:42 +02:00
Miriam Baglioni 1af6571474 merging with branch beta 2024-07-25 15:48:05 +02:00
Claudio Atzori a81c555fe6 [graph provision] include only FoS L1..L2 in the record serialization 2024-07-25 15:26:47 +02:00
Claudio Atzori 359b8ebda8 [graph provision] include only FoS L1..L2 in the record serialization 2024-07-25 15:22:29 +02:00
Miriam Baglioni c7f6669f1a [webcrawl] the blacklist is now in json and no more in csv after the normalization process 2024-07-25 15:20:18 +02:00
Miriam Baglioni 7cff281d3e [webcrawl] the blacklist is now in json and no more in csv after the normalization process 2024-07-25 15:16:42 +02:00
Claudio Atzori d4bf449e8c minor 2024-07-25 14:53:06 +02:00
Miriam Baglioni fc60661ac5 [webcrawl] added code and test (code/resource) to verify the deletion of the relations related to results put in blacklist 2024-07-25 12:25:14 +02:00
Claudio Atzori d771a883f9 [dedup] updated sql query used to read organizations from the OpenOrgs DB to include their typology 2024-07-25 09:53:48 +02:00
Claudio Atzori 01958a3e07 [graph provision] addded filter to exclude records marked with datainfo.deletedbyinference = true 2024-07-24 10:00:10 +02:00
Miriam Baglioni 6f1801d7d1 [webcrawl]- 2024-07-23 17:34:48 +02:00
Miriam Baglioni 19806c2ae3 [SDG]fixed switch of methods 2024-07-23 17:12:55 +02:00
Antonis Lempesis d0590e0e49 added latest institutions 2024-07-23 15:17:15 +03:00
Antonis Lempesis 7d2c0a3723 added new institutions 2024-07-23 15:10:17 +03:00
Miriam Baglioni 62649dc5c4 merging with branch beta 2024-07-23 12:50:12 +02:00
Miriam Baglioni 9573bf576d [SDG]added code to ingest also the SDG without DOI 2024-07-23 12:47:57 +02:00
Michele Artini d27e9ea50f added ODF invisible stores in raw_all workflow 2024-07-23 09:56:27 +02:00
Michele De Bonis 4f4c73d65b minor change: addition of missing parameter in sql query 2024-07-22 15:19:02 +02:00
Miriam Baglioni 79985ad197 [Crossref]added mapping for DFG versus the unidentified project [https://support.openaire.eu/issues/9926?next_issue_id=9924&prev_issue_id=9927#note-4] 2024-07-17 18:30:24 +02:00
Claudio Atzori 06e3985b77 merged from beta 2024-07-17 12:01:40 +02:00
Claudio Atzori 83327239de fixed pom definitions, bumped dependency version for the dhp-schema module, removed unnecessary dependencies 2024-07-17 11:58:48 +02:00
Claudio Atzori db9c54c944 Revert "removed legacy actionmanager dependencies"
This reverts commit bb12d0b4df.
2024-07-17 11:27:43 +02:00
Claudio Atzori e39e8bbd47 Merge pull request '[WebCrawlAffiliation]remove from the creation of the action set the relations for pmc and pmid. Only doi are allowed' (#462) from affiliationFromWebCrawlOnlyDOI into beta
Reviewed-on: #462
2024-07-17 11:12:32 +02:00
Claudio Atzori e94ae771ff Merge pull request '[BulkTag]added tagging for the organization relevant for the community.' (#461) from tagOrganization into beta
Reviewed-on: #461
2024-07-17 11:11:52 +02:00
Claudio Atzori 78b5e4bb6f reverted changed contens under dhp-graph-provision 2024-07-17 10:48:20 +02:00
Claudio Atzori 40c5d87645 Merge pull request '[graph provision] entity level contexts' (#460) from entity_contexts into beta
Reviewed-on: #460
2024-07-17 10:43:21 +02:00
Claudio Atzori a65241fcaf Merge pull request 'implementation of the new collector plugin: research_fi' (#456) from research_fi_collector_plugin into beta
Reviewed-on: #456
2024-07-17 10:25:38 +02:00
Claudio Atzori 6665976604 Merge pull request 'Optimizations for the Openorgs Dedup: normalization and inference of strings and implementation of new general-purpose comparators' (#455) from openorgs_optimization into beta
Reviewed-on: #455
2024-07-17 10:25:20 +02:00
Claudio Atzori c99f92efaa Merge pull request '[beta] OpenAIRE Affiliation Inference' (#452) from affRoFromRawString into beta
Reviewed-on: #452
2024-07-17 10:24:39 +02:00
Claudio Atzori f17e1243ba reverted changed contens under dhp-graph-provision 2024-07-17 10:23:50 +02:00
Claudio Atzori 6a19337dab Merge pull request 'removed legacy actionmanager dependencies' (#454) from cleanup_actionmanager_deps into beta
Reviewed-on: #454
2024-07-17 10:20:44 +02:00
Miriam Baglioni d96215cb9b [UnpayWall]added othe : in the identifier construction 2024-07-16 18:17:32 +02:00
Miriam Baglioni 9246bdec1c [WebCrawlAffiliation]remove from the creation of the action set the relations for pmc and pmid. Only doi are allowed 2024-07-16 14:07:37 +02:00
Miriam Baglioni 9d27910144 [BulkTag]added tagging for the organization relevant for the community. Added test. Changed the tagging variables. 2024-07-16 13:48:48 +02:00
Claudio Atzori beb93cdfe9 [graph provision] expand the context info for each entity type 2024-07-16 11:43:48 +02:00
Claudio Atzori 38f8ed27fd [graph provision] log the Solr admin application operations for alias deletion and creation 2024-07-15 16:30:43 +02:00
Claudio Atzori 1fb44198fb renamed workflow to better reflect its purpose 2024-07-15 15:24:38 +02:00
Claudio Atzori 6f6e85ddf4 code formatting 2024-07-15 09:32:04 +02:00
Claudio Atzori 7fa3d51200 renamed class, updated criteria to consider the ORCIDs used in the matchers 2024-07-15 09:18:58 +02:00
Michele Artini f99fb21040 tests 2024-07-15 09:18:46 +02:00
Claudio Atzori e17edb2581 [broker] fine tuned the workflow memory settings 2024-07-12 10:27:50 +02:00
Claudio Atzori 61d1fa9b9f [metadata collection] added -Dcom.sun.security.enableAIAcaIssuers=true as a default for metadata collection 2024-07-12 10:26:45 +02:00
Claudio Atzori f9ed2ae33c [metadata collection] added the possibility to specify the JAVA_HOME and the JAVA_OPTS parameters 2024-07-11 15:32:36 +02:00
Michele Artini bbe52584f7 log message 2024-07-11 15:14:34 +02:00
Michele Artini 5cdba9172b implementeation of the new collector plugin: research_fi 2024-07-10 14:53:13 +02:00
Michele De Bonis 2a36ccb997 optimization of normalization stage in openorgs workflow, implementation of new comparators replacing older versions, openorgs configuration update, addition of inference flag in model definition, new test classes 2024-07-09 16:58:10 +02:00
Miriam Baglioni c465835061 [Person]new implementation for the extraction of the coAuthorship relations 2024-07-09 12:29:55 +02:00
Miriam Baglioni 814e650e12 [Irish Tender]changed the irish.json file according to comments #26, #29, and #34 for 9635 2024-07-04 12:24:28 +02:00
Miriam Baglioni ddd20e7f8e [Person]first implementation of the action set to include Person entity in the graph starting from the orcid data 2024-07-04 12:08:46 +02:00
Lampros Smyrnaios e9686365a2 Improve performance of creating the "result_fos" table, by using a temp-table to cache data, which is requested multiple times. 2024-07-03 20:24:36 +03:00
Lampros Smyrnaios ce0aee21cc Improve performance of transferring the stats-DBs to another cluster and querying the DBs' tables, by ordering Spark to create up to 100 files per table, instead of thousands. 2024-07-03 20:15:33 +03:00
Lampros Smyrnaios 7b7dd32ad5 - Fix placement of some "set mapred.job.queue.name=analytics" statements and remove their unused "/*EOS*/" indicator.
- Add stacktrace-info to failed actions.
2024-07-03 19:53:24 +03:00
Lampros Smyrnaios 7ce051d766 - Update the remaining hive-actions to spark-actions.
- Update the version of shell-actions.
- Fix missing "/*EOS*/" indicators.
2024-07-03 19:49:19 +03:00
Lampros Smyrnaios aa4d7d5e20 Prioritize the rest of the stats-queries over other tasks on the cluster, by putting them in the "analytics" queue. 2024-07-03 19:14:25 +03:00
Claudio Atzori bb12d0b4df removed legacy actionmanager dependencies 2024-07-03 16:26:39 +02:00
Lampros Smyrnaios 54e11b6a43 Improve performance and efficiency by rewriting the creation process of "publication", "project", "dataset", "datasource", "software", "otherresearchproduct" and "result" tables, to be performed in a single query, for each one. 2024-07-03 13:03:15 +03:00
Miriam Baglioni a2b708bb71 [AffiliationIngestion]refactoring 2024-06-29 18:36:47 +02:00
Miriam Baglioni 9cbe966b4a [AffiliationIngestion]refactoring 2024-06-29 18:35:49 +02:00
Miriam Baglioni 236b64d830 [AffiliationIngestion]Extended the ingestion of affiliation from open aire to include also links derived from Web Crawl. Extended the test. Inserted in Constatns the id and name of the webcrawl datasource to be used here and also in the ingestion of links from web crawl 2024-06-29 18:29:20 +02:00
Miriam Baglioni 67ff783e65 [Person]First implementation to include Person entity in the graph 2024-06-29 17:13:01 +02:00
Michele De Bonis a10e8d9f05 implementation of countryMatch and addition of workflow parameters 2024-06-28 16:46:52 +02:00
Claudio Atzori 14539f9c8b [graph provision] publicFormat worfklow parameter defined as optional 2024-06-28 14:55:18 +02:00
Claudio Atzori 1bc8c5d173 [graph provision] fixed serialization of the instancetypes 2024-06-28 14:54:28 +02:00
Claudio Atzori 1ccf01cdb8 Using the updated Solr JSON payload model classes 2024-06-28 12:38:07 +02:00
Claudio Atzori b79cb155ba Merge pull request 'Fix permissions-issue in Stats-workflow, step22a-createPDFsAggregated.' (#450) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #450
2024-06-26 10:11:34 +02:00
Claudio Atzori 33a02c5b9e Merge pull request 'Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to converge to records having DOI pid' (#446) from pivotselectionbypid into beta
Reviewed-on: #446
2024-06-26 10:10:13 +02:00
Lampros Smyrnaios fe2275a9b0 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
# Conflicts:
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Claudio Atzori 1c30eacac2 updated index feeding procedure to exploit the collection aliases 2024-06-25 15:27:38 +02:00
Claudio Atzori 6055212f77 merged from the json_payload branch 2024-06-25 12:39:02 +02:00
Claudio Atzori 0031cf849e Merge branch 'beta' into 9872-create-solr-collection-aliases 2024-06-25 09:58:01 +02:00
Serafeim Chatzopoulos 9f6e16a03c Add support to cretate/update solr collection aliases 2024-06-20 16:03:15 +03:00
Lampros Smyrnaios 66cd28f70a - Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-20 14:33:46 +03:00
Miriam Baglioni d35edac212 [IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23 2024-06-20 12:28:28 +02:00
Miriam Baglioni 6421f8fece Merge remote-tracking branch 'origin/beta' into beta 2024-06-19 11:12:15 +02:00
Miriam Baglioni ac270f795b [IrishFunderList]make changed according to 9635 comment 14, 15 and 16 2024-06-19 11:11:52 +02:00
Lampros Smyrnaios 285416c74e Merge branch 'beta' into beta 2024-06-18 13:50:38 +02:00
Lampros Smyrnaios 3095047e5e Miscellaneous updates to the copying operation to Impala Cluster:
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 14:40:41 +03:00
Antonis Lempesis 0456f1b788 Merge remote-tracking branch 'origin/beta' into beta 2024-06-14 15:11:30 +03:00
Antonis Lempesis 38636942c7 filtering out deletedbyinference and invinsible results from accessroute 2024-06-14 15:11:19 +03:00
Lampros Smyrnaios d942a1101b Miscellaneous updates to the copying operation to Impala Cluster:
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-14 12:14:38 +03:00
Giambattista Bloisi 9bf2bda1c6 Fix: next returned a null value at end of stream 2024-06-12 13:28:51 +02:00
Giambattista Bloisi d90cb099b8 Fix for paginationStart parameter management 2024-06-11 20:23:44 +02:00
Giambattista Bloisi 4f2a61e10f Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to slowly converge to records having DOI pid 2024-06-11 15:33:56 +02:00
Claudio Atzori 11fe3a4fe0 [graph resolution] use sparkExecutorMemory to define also the memoryOverhead 2024-06-11 14:21:17 +02:00
Miriam Baglioni 8fe934810f Merge remote-tracking branch 'origin/beta' into beta 2024-06-11 10:28:51 +02:00
Miriam Baglioni 9da006e98c [SDGFoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance 2024-06-11 10:28:32 +02:00
Giambattista Bloisi 85c1eae7e0 Fixes for pagination strategy looping at end of download 2024-06-10 19:03:58 +02:00
Claudio Atzori b0eba210c0 [actionset promotion] use sparkExecutorMemory to define also the memoryOverhead 2024-06-10 16:15:24 +02:00
Claudio Atzori 3776327a8c hostedby patching to work with the updated Crossref contents, resolved conflict 2024-06-10 15:24:12 +02:00
Claudio Atzori 0139f23d66 Merge pull request 'organization type from OpenOrgs' (#445) from import_openorg_type into beta
Reviewed-on: #445
2024-06-07 12:17:31 +02:00
Michele Artini c726572418 changed some parameters in OSF test 2024-06-07 12:03:26 +02:00
Claudio Atzori ec79405cc9 [graph raw] set organization type from openorgs 2024-06-07 11:30:31 +02:00
Miriam Baglioni 1477406ecc [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-06 10:45:41 +02:00