Claudio Atzori
07e7b9315c
code formatting
2024-08-02 14:42:24 +02:00
Alessia
39810c6e7e
Rest collector plugin on hadoop supports a new param to pass request headers
2024-08-02 14:41:43 +02:00
Claudio Atzori
e0f58afd30
[graph provision] include only FoS L1..L2 in the record serialization
2024-08-02 10:58:57 +02:00
Claudio Atzori
60cf7d86a1
[graph provision] include only FoS L1..L2 in the record serialization
2024-08-02 10:58:47 +02:00
Claudio Atzori
fecbf93e0e
Merge pull request 'FoS L1 & L2' ( #465 ) from fos_l1l2 into beta
...
Reviewed-on: #465
2024-08-01 13:58:04 +02:00
Claudio Atzori
64740475d0
depending on dhp-schemas:7.0.1
2024-07-29 11:51:42 +02:00
Miriam Baglioni
1af6571474
merging with branch beta
2024-07-25 15:48:05 +02:00
Claudio Atzori
a81c555fe6
[graph provision] include only FoS L1..L2 in the record serialization
2024-07-25 15:26:47 +02:00
Claudio Atzori
359b8ebda8
[graph provision] include only FoS L1..L2 in the record serialization
2024-07-25 15:22:29 +02:00
Miriam Baglioni
c7f6669f1a
[webcrawl] the blacklist is now in json and no more in csv after the normalization process
2024-07-25 15:20:18 +02:00
Miriam Baglioni
7cff281d3e
[webcrawl] the blacklist is now in json and no more in csv after the normalization process
2024-07-25 15:16:42 +02:00
Claudio Atzori
d4bf449e8c
minor
2024-07-25 14:53:06 +02:00
Miriam Baglioni
fc60661ac5
[webcrawl] added code and test (code/resource) to verify the deletion of the relations related to results put in blacklist
2024-07-25 12:25:14 +02:00
Claudio Atzori
d771a883f9
[dedup] updated sql query used to read organizations from the OpenOrgs DB to include their typology
2024-07-25 09:53:48 +02:00
Claudio Atzori
01958a3e07
[graph provision] addded filter to exclude records marked with datainfo.deletedbyinference = true
2024-07-24 10:00:10 +02:00
Miriam Baglioni
6f1801d7d1
[webcrawl]-
2024-07-23 17:34:48 +02:00
Miriam Baglioni
19806c2ae3
[SDG]fixed switch of methods
2024-07-23 17:12:55 +02:00
Miriam Baglioni
62649dc5c4
merging with branch beta
2024-07-23 12:50:12 +02:00
Miriam Baglioni
9573bf576d
[SDG]added code to ingest also the SDG without DOI
2024-07-23 12:47:57 +02:00
Michele Artini
d27e9ea50f
added ODF invisible stores in raw_all workflow
2024-07-23 09:56:27 +02:00
Michele De Bonis
4f4c73d65b
minor change: addition of missing parameter in sql query
2024-07-22 15:19:02 +02:00
Miriam Baglioni
79985ad197
[Crossref]added mapping for DFG versus the unidentified project [ https://support.openaire.eu/issues/9926?next_issue_id=9924&prev_issue_id=9927#note-4 ]
2024-07-17 18:30:24 +02:00
Claudio Atzori
06e3985b77
merged from beta
2024-07-17 12:01:40 +02:00
Claudio Atzori
83327239de
fixed pom definitions, bumped dependency version for the dhp-schema module, removed unnecessary dependencies
2024-07-17 11:58:48 +02:00
Claudio Atzori
db9c54c944
Revert "removed legacy actionmanager dependencies"
...
This reverts commit bb12d0b4df
.
2024-07-17 11:27:43 +02:00
Claudio Atzori
e39e8bbd47
Merge pull request '[WebCrawlAffiliation]remove from the creation of the action set the relations for pmc and pmid. Only doi are allowed' ( #462 ) from affiliationFromWebCrawlOnlyDOI into beta
...
Reviewed-on: #462
2024-07-17 11:12:32 +02:00
Claudio Atzori
e94ae771ff
Merge pull request '[BulkTag]added tagging for the organization relevant for the community.' ( #461 ) from tagOrganization into beta
...
Reviewed-on: #461
2024-07-17 11:11:52 +02:00
Claudio Atzori
78b5e4bb6f
reverted changed contens under dhp-graph-provision
2024-07-17 10:48:20 +02:00
Claudio Atzori
40c5d87645
Merge pull request '[graph provision] entity level contexts' ( #460 ) from entity_contexts into beta
...
Reviewed-on: #460
2024-07-17 10:43:21 +02:00
Claudio Atzori
a65241fcaf
Merge pull request 'implementation of the new collector plugin: research_fi' ( #456 ) from research_fi_collector_plugin into beta
...
Reviewed-on: #456
2024-07-17 10:25:38 +02:00
Claudio Atzori
6665976604
Merge pull request 'Optimizations for the Openorgs Dedup: normalization and inference of strings and implementation of new general-purpose comparators' ( #455 ) from openorgs_optimization into beta
...
Reviewed-on: #455
2024-07-17 10:25:20 +02:00
Claudio Atzori
c99f92efaa
Merge pull request '[beta] OpenAIRE Affiliation Inference' ( #452 ) from affRoFromRawString into beta
...
Reviewed-on: #452
2024-07-17 10:24:39 +02:00
Claudio Atzori
f17e1243ba
reverted changed contens under dhp-graph-provision
2024-07-17 10:23:50 +02:00
Claudio Atzori
6a19337dab
Merge pull request 'removed legacy actionmanager dependencies' ( #454 ) from cleanup_actionmanager_deps into beta
...
Reviewed-on: #454
2024-07-17 10:20:44 +02:00
Miriam Baglioni
8f11dfe554
[UnpayWall]added othe : in the identifier construction
2024-07-16 18:18:38 +02:00
Miriam Baglioni
d96215cb9b
[UnpayWall]added othe : in the identifier construction
2024-07-16 18:17:32 +02:00
Miriam Baglioni
9246bdec1c
[WebCrawlAffiliation]remove from the creation of the action set the relations for pmc and pmid. Only doi are allowed
2024-07-16 14:07:37 +02:00
Miriam Baglioni
9d27910144
[BulkTag]added tagging for the organization relevant for the community. Added test. Changed the tagging variables.
2024-07-16 13:48:48 +02:00
Claudio Atzori
beb93cdfe9
[graph provision] expand the context info for each entity type
2024-07-16 11:43:48 +02:00
Claudio Atzori
d20a5e020a
[graph provision] log the Solr admin application operations for alias deletion and creation
2024-07-15 16:31:04 +02:00
Claudio Atzori
38f8ed27fd
[graph provision] log the Solr admin application operations for alias deletion and creation
2024-07-15 16:30:43 +02:00
Claudio Atzori
1fb44198fb
renamed workflow to better reflect its purpose
2024-07-15 15:24:38 +02:00
Claudio Atzori
3d1d8e6036
renamed workflow to better reflect its purpose
2024-07-15 15:24:18 +02:00
Claudio Atzori
6f6e85ddf4
code formatting
2024-07-15 09:32:04 +02:00
Claudio Atzori
7fa3d51200
renamed class, updated criteria to consider the ORCIDs used in the matchers
2024-07-15 09:18:58 +02:00
Michele Artini
f99fb21040
tests
2024-07-15 09:18:46 +02:00
Claudio Atzori
b70a440aca
renamed class, updated criteria to consider the ORCIDs used in the matchers
2024-07-12 17:09:01 +02:00
Michele Artini
36c3df1652
tests
2024-07-12 15:29:45 +02:00
Claudio Atzori
e17edb2581
[broker] fine tuned the workflow memory settings
2024-07-12 10:27:50 +02:00
Claudio Atzori
2f13683285
[broker] fine tuned the workflow memory settings
2024-07-12 10:27:24 +02:00
Claudio Atzori
61d1fa9b9f
[metadata collection] added -Dcom.sun.security.enableAIAcaIssuers=true as a default for metadata collection
2024-07-12 10:26:45 +02:00
Claudio Atzori
5ab409dcab
[metadata collection] added -Dcom.sun.security.enableAIAcaIssuers=true as a default for metadata collection
2024-07-12 10:26:32 +02:00
Claudio Atzori
f9ed2ae33c
[metadata collection] added the possibility to specify the JAVA_HOME and the JAVA_OPTS parameters
2024-07-11 15:32:36 +02:00
Claudio Atzori
51d6a541bd
[metadata collection] added the possibility to specify the JAVA_HOME and the JAVA_OPTS parameters
2024-07-11 15:24:29 +02:00
Michele Artini
bbe52584f7
log message
2024-07-11 15:14:34 +02:00
Claudio Atzori
07ce92cef2
[OAI-PMH] fixed node name
2024-07-11 11:00:23 +02:00
Michele Artini
5cdba9172b
implementeation of the new collector plugin: research_fi
2024-07-10 14:53:13 +02:00
Michele De Bonis
2a36ccb997
optimization of normalization stage in openorgs workflow, implementation of new comparators replacing older versions, openorgs configuration update, addition of inference flag in model definition, new test classes
2024-07-09 16:58:10 +02:00
Miriam Baglioni
c465835061
[Person]new implementation for the extraction of the coAuthorship relations
2024-07-09 12:29:55 +02:00
Miriam Baglioni
814e650e12
[Irish Tender]changed the irish.json file according to comments #26 , #29 , and #34 for 9635
2024-07-04 12:24:28 +02:00
Miriam Baglioni
f043b7b096
[Irish Tender]changed the irish.json file according to comments #26 , #29 , and #34 for 9635
2024-07-04 12:22:56 +02:00
Miriam Baglioni
ddd20e7f8e
[Person]first implementation of the action set to include Person entity in the graph starting from the orcid data
2024-07-04 12:08:46 +02:00
Claudio Atzori
bb12d0b4df
removed legacy actionmanager dependencies
2024-07-03 16:26:39 +02:00
Claudio Atzori
ed97ba4565
Merge pull request '[prod] Openaire Affiliation Inference' ( #453 ) from affRoFromRawStringmain into main
...
Reviewed-on: #453
2024-07-03 12:32:26 +02:00
Claudio Atzori
7b398a6d0b
updated import of organization types from OpenOrgs
2024-07-03 11:11:35 +02:00
Claudio Atzori
13f6506ce5
Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to slowly converge to records having DOI
2024-07-03 10:44:01 +02:00
Claudio Atzori
3d9ddaa23a
importing organization types from OpenOrgs
2024-07-03 10:15:37 +02:00
Michele De Bonis
ea1841fbd2
implementation of countryMatch and addition of workflow parameters
2024-07-01 09:14:32 +02:00
Miriam Baglioni
4dbce39237
[AffiliationInference]Extended the affiliation ingestion from OpenAIRE to include also the links derived from web crawl. Changed the provenance from BIP! to OpenAIRE
2024-06-29 18:51:06 +02:00
Miriam Baglioni
3ee8a7d18a
[WebCrawl]moved to Constants web crawl name and id
2024-06-29 18:47:23 +02:00
Miriam Baglioni
a2b708bb71
[AffiliationIngestion]refactoring
2024-06-29 18:36:47 +02:00
Miriam Baglioni
9cbe966b4a
[AffiliationIngestion]refactoring
2024-06-29 18:35:49 +02:00
Miriam Baglioni
236b64d830
[AffiliationIngestion]Extended the ingestion of affiliation from open aire to include also links derived from Web Crawl. Extended the test. Inserted in Constatns the id and name of the webcrawl datasource to be used here and also in the ingestion of links from web crawl
2024-06-29 18:29:20 +02:00
Miriam Baglioni
67ff783e65
[Person]First implementation to include Person entity in the graph
2024-06-29 17:13:01 +02:00
Michele De Bonis
a10e8d9f05
implementation of countryMatch and addition of workflow parameters
2024-06-28 16:46:52 +02:00
Claudio Atzori
14539f9c8b
[graph provision] publicFormat worfklow parameter defined as optional
2024-06-28 14:55:18 +02:00
Claudio Atzori
1bc8c5d173
[graph provision] fixed serialization of the instancetypes
2024-06-28 14:54:28 +02:00
Claudio Atzori
ee7deb3f60
[graph provision] publicFormat worfklow parameter defined as optional
2024-06-28 14:52:43 +02:00
Claudio Atzori
157cc8be87
[graph provision] fixed serialization of the instancetypes
2024-06-28 14:21:12 +02:00
Claudio Atzori
1ccf01cdb8
Using the updated Solr JSON payload model classes
2024-06-28 12:38:07 +02:00
Claudio Atzori
023099a921
imported from beta
2024-06-26 11:40:16 +02:00
Claudio Atzori
786c217085
Using the updated Solr JSON payload model classes
2024-06-26 11:11:33 +02:00
Claudio Atzori
b79cb155ba
Merge pull request 'Fix permissions-issue in Stats-workflow, step22a-createPDFsAggregated.' ( #450 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #450
2024-06-26 10:11:34 +02:00
Lampros Smyrnaios
c858c02111
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-26 10:11:21 +02:00
Claudio Atzori
33a02c5b9e
Merge pull request 'Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to converge to records having DOI pid' ( #446 ) from pivotselectionbypid into beta
...
Reviewed-on: #446
2024-06-26 10:10:13 +02:00
Claudio Atzori
1c30eacac2
updated index feeding procedure to exploit the collection aliases
2024-06-25 15:27:38 +02:00
Claudio Atzori
6055212f77
merged from the json_payload branch
2024-06-25 12:39:02 +02:00
Claudio Atzori
0031cf849e
Merge branch 'beta' into 9872-create-solr-collection-aliases
2024-06-25 09:58:01 +02:00
Claudio Atzori
8220e27110
Merge pull request 'Align Solr JSON records to the explore portal requirements' ( #448 ) from json_payload into beta_to_master_may2024
...
Reviewed-on: #448
2024-06-25 09:57:40 +02:00
Claudio Atzori
1dc7458de2
added JSON payload to the SolrInputDocument, updated unit tests
2024-06-24 14:48:09 +02:00
Claudio Atzori
a7a54aab47
WIP: align Solr JSON records to the explore portal requirements
2024-06-20 15:48:45 +02:00
Serafeim Chatzopoulos
9f6e16a03c
Add support to cretate/update solr collection aliases
2024-06-20 16:03:15 +03:00
Lampros Smyrnaios
66cd28f70a
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-20 14:33:46 +03:00
Miriam Baglioni
eaa00a4199
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:32:57 +02:00
Miriam Baglioni
d35edac212
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:28:28 +02:00
Claudio Atzori
fb731b6d46
WIP: align Solr JSON records to the explore portal requirements
2024-06-19 15:38:43 +02:00
Miriam Baglioni
6421f8fece
Merge remote-tracking branch 'origin/beta' into beta
2024-06-19 11:12:15 +02:00
Miriam Baglioni
ac270f795b
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:11:52 +02:00
Miriam Baglioni
b6da35e736
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:06:58 +02:00
Lampros Smyrnaios
3c9b8de892
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 15:59:34 +02:00