Commit Graph

4650 Commits

Author SHA1 Message Date
Claudio Atzori 15227f82b8 added related author's given name and family name in the solr json payload serialisation 2024-11-20 15:52:40 +01:00
Miriam Baglioni 3081cad1d3 [CommunityAPI] refactoring 2024-11-20 14:03:59 +01:00
Miriam Baglioni 6beb94adee [SubCommunity] Extention of the Utils methods to add also the associations between the subcommunities and organization/project/datasources 2024-11-20 10:59:49 +01:00
sandro.labruzzo ac8995ab64 Merge remote-tracking branch 'origin/beta' into crossref_mapping_improvement 2024-11-20 09:52:51 +01:00
sandro.labruzzo 496007188a Added assertion on CrossrefMappingTest 2024-11-20 09:50:09 +01:00
Miriam Baglioni 9dbcf19efb [SubCommunity] Extention of communityApis to add also the associations between the subcommunities and organization/project/datasources 2024-11-20 09:16:33 +01:00
Claudio Atzori 4e55ddc547 [PubMed aggregation] storing contents into mdStoreVersion/store 2024-11-19 16:50:42 +01:00
Claudio Atzori ff5cb32067 Merge pull request 'abstracts in ODF records from the datacite and the dc nsPrefixes' (#508) from abtracts_guidelines4 into beta
Reviewed-on: #508
2024-11-19 15:12:53 +01:00
Claudio Atzori a48d080e08 Merge pull request 'Improve OAF Generation from Baseline PubMed Collection' (#504) from pubmed_fix into beta
Reviewed-on: #504
2024-11-19 15:12:37 +01:00
sandro.labruzzo a1297082e2 Crossref Enhancements:
-Accurate Review Type Assignment: Resolved an issue identified in ticket https://support.openaire.eu/issues/9525#note-13. When a relationship of "is-review-of" is detected, the publication type is now correctly set to "Review."
-Enhanced Author Affiliation Data: Implemented Miriam's suggestion by including a new field, "RawAffiliationString," in each author entry. This additional data provides a more granular level of detail regarding author affiliations, potentially improving discoverability and research analysis.
2024-11-19 14:57:18 +01:00
Miriam Baglioni cea2de2c37 [SubCommunity] Extention of CommunityAPIs fro bulk tagging 2024-11-19 14:50:42 +01:00
Claudio Atzori 9e439f5eca map the abstracts considering both the datacite and the dc nsPrefix 2024-11-15 12:19:26 +01:00
Claudio Atzori cf7d9a32ab disable autoBroadcastJoin in the cleaning workflow 2024-11-15 09:17:28 +01:00
Claudio Atzori 5f512f510e code formatting 2024-11-15 09:16:51 +01:00
Claudio Atzori 9e8849b753 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-11-13 20:41:51 +01:00
sandro.labruzzo 4778a70478 Merge remote-tracking branch 'origin/beta' into pubmed_fix 2024-11-13 16:28:39 +01:00
Claudio Atzori 4a3b173ca2 defaults to 0000 - Unknown in case the instance type lookup in the dnet:result_typologies doesn't find a corresponding result type binding 2024-11-13 16:27:00 +01:00
sandro.labruzzo ac0a94d62d updated pubmed parser to add also ORCID id and affiliation string to authors 2024-11-13 16:26:59 +01:00
Giambattista Bloisi 5ee8881646 Merge pull request '[danishfunders] added link for danish funders versus the unidentified project for IRFD (501100004836) CF (501100002808) and NNF(501100009708)' (#502) from danishFunders_crossrefmap into beta
Reviewed-on: #502
2024-11-13 12:01:38 +01:00
Miriam Baglioni fb1f0f8850 [danishfunders] added the possibility to link also versus a specif award if present in the metadata 2024-11-13 12:00:33 +01:00
Giambattista Bloisi 03c262ccb9 Crossref: generate canonical openaire id for results in affiliation relationship 2024-11-13 10:56:17 +01:00
sandro.labruzzo a1d5ad5c26 code formatted 2024-11-13 09:51:13 +01:00
sandro.labruzzo b0478c380e merged conflicts on beta 2024-11-13 09:43:16 +01:00
Claudio Atzori 8088943399 Merge pull request 'enforce resulttype' (#506) from merge_resulttypes into beta
Reviewed-on: #506
2024-11-12 14:20:22 +01:00
Claudio Atzori 6c5df761e2 enforce resulttype based on the dnet:result_typologies vocabulary and upon merge 2024-11-12 14:18:04 +01:00
Miriam Baglioni 250f101779 [person] fixed issue in creating project identifier for the graph for person->project relations 2024-11-11 16:04:06 +01:00
Miriam Baglioni f1ea9da5bc [person] checked type in inferenceprovenance 2024-11-11 15:37:56 +01:00
Miriam Baglioni b0283fe94c [person] fix provenance of pid in person when it is orcid (classid entityregistry to avoid the cleaning put orcid_pending) 2024-11-11 14:57:57 +01:00
sandro.labruzzo 474f365286 removed wrong test 2024-11-11 12:37:27 +01:00
sandro.labruzzo 19ce783e58 renamed workflow 2024-11-11 12:28:02 +01:00
Sandro La Bruzzo 0d0904f4ec updated workflow baseline to direct transform on OAF 2024-11-11 10:27:23 +01:00
Miriam Baglioni 6fd9ec8566 [danishfunders] added link for danish funders versus the unidentified project for IRFD (501100004836) CF (501100002808) and NNF(501100009708) 2024-11-07 13:55:31 +01:00
Claudio Atzori f7bb53fe78 [orcid enrichment] added missing workflow parameter: workingDir 2024-11-07 01:04:43 +01:00
Claudio Atzori 973aa7dca6 [dedup] force the Relation schema when reading the merge rels 2024-11-06 12:29:06 +01:00
Sandro La Bruzzo c1cef5d685 removed old library joda time replaced with standard java.time introduced in java 8 2024-11-05 10:38:40 +01:00
Sandro La Bruzzo a8ed5a3b04 Organized getters and setters in the PMArticle class for better readability and maintainability. 2024-11-04 17:45:28 +01:00
Claudio Atzori a42c8b7c85 person table directory produced by the workflows raw_all and merge graphs 2024-10-30 11:25:17 +01:00
Claudio Atzori 323c76eafc patch relations job: removed non necessary logging 2024-10-30 07:35:30 +01:00
Miriam Baglioni 69aee609ef [bulktag] align type to community api 2024-10-29 15:53:04 +01:00
Claudio Atzori 499892b67c [graph raw] rule out empty PIDs 2024-10-29 09:51:30 +01:00
Claudio Atzori e4504fd98d [Person] fixed project identifier creation 2024-10-28 15:32:09 +01:00
Claudio Atzori 9b4415cb67 using _the right_ scala 2.11 converters 2024-10-28 13:56:25 +01:00
Claudio Atzori e6ca382deb using scala 2.11 converters 2024-10-28 13:52:06 +01:00
Claudio Atzori 940735921f Merge pull request 'Fill mergedIds field and filter mergerels with dedup records actually created' (#500) from mergedids into beta
Reviewed-on: #500
2024-10-28 13:43:09 +01:00
Giambattista Bloisi 56224e034a Fill the new mergedIds field when generating dedup records
Filter out dedup records composed of invisible records only
Filter out mergerels that have not been used when creating the dedup record (ungrouping of cliques)
2024-10-28 13:31:01 +01:00
Miriam Baglioni 5916346ba1 [TransformativeAgreement] fix to remove the file downloaded from a previous run of the workflow 2024-10-28 12:18:50 +01:00
Claudio Atzori e4abe55988 merged person_through_the_graph & code formatting 2024-10-28 11:01:49 +01:00
Claudio Atzori d71df6de19 Merge pull request 'affroNewModelonBeta' (#494) from affroNewModelonBeta into beta
Reviewed-on: #494
2024-10-28 10:48:34 +01:00
Claudio Atzori 1cdcd07a7e Merge pull request 'dhp-schema upgrade & provision mapping 2' (#499) from beta_provision_alignment_9.0.0 into beta
Reviewed-on: #499
2024-10-28 10:44:08 +01:00
Claudio Atzori 6fd50266f1 translate 'otherresearchproduct' into 'other' when setting the related record type 2024-10-28 10:42:46 +01:00
Claudio Atzori dffa376eb6 Merge pull request 'dhp-schema upgrade & provision mapping' (#498) from beta_provision_alignment_9.0.0 into beta
Reviewed-on: #498
2024-10-28 10:03:24 +01:00
Claudio Atzori 32fa579b80 [graph provision] select the longest abstract 2024-10-28 10:03:02 +01:00
Claudio Atzori 67e37f41fb Merge pull request 'blacklist filtering moved before the cleanup phase in order to have case sensitive regex' (#485) from dedup_blacklist_fix into beta
Reviewed-on: #485
2024-10-28 09:42:51 +01:00
Miriam Baglioni 0fb6af5586 Updated main pom dependency against dhp-schema, from 8.0.1 to 9.0.0. The new fields included in the updated schema module are populated by the Solr JSON payload mapping, which also limits the number of authors serialised to 200. 2024-10-25 16:28:50 +02:00
Claudio Atzori 46dbb62598 Merge pull request '#9839: include claimed affiliation relationships' (#476) from claim-orgs into beta
Reviewed-on: #476
2024-10-25 10:12:59 +02:00
Claudio Atzori 4a9aeb6238 Merge pull request '9126-impact-indicators-wf-optimisation' (#471) from 9126-impact-indicators-wf-optimisation into beta
Reviewed-on: #471
2024-10-25 10:10:44 +02:00
Claudio Atzori 8172bee8c8 Merge pull request 'Minor fixes' (#496) from beta_fixes_oct into beta
Reviewed-on: #496
2024-10-25 10:09:56 +02:00
Miriam Baglioni 1fce7d5a0f [Person] remove the isolated nodes from the person set 2024-10-25 10:05:17 +02:00
Miriam Baglioni 842cc75dae [AffRo] fix name 2024-10-25 09:42:52 +02:00
Miriam Baglioni e75326d6ec [FundersMatchFromCrossref] added match from CrossRef to DFG unidentified project 2024-10-25 09:13:54 +02:00
Miriam Baglioni 32f444984e [person] - 2024-10-24 17:51:42 +02:00
Miriam Baglioni cab8f1135f [affroNewModel] - 2024-10-24 17:44:33 +02:00
Miriam Baglioni c93bf82487 [affroNewModel] extended wf definition 2024-10-24 17:34:34 +02:00
Miriam Baglioni a7699558ed [person] - 2024-10-24 16:15:12 +02:00
Miriam Baglioni 01679c935a [person] added test class to be implemented 2024-10-24 15:27:06 +02:00
Miriam Baglioni c773421cc7 [person] added new substep in propagation worflow main 2024-10-24 14:44:13 +02:00
Miriam Baglioni cf07ed9058 [person] refactoring 2024-10-24 14:35:14 +02:00
Miriam Baglioni c921cf7ee0 [personEntity] removed the deletedbyinference results (not indexed, but still in the graph). Changed the writing mode: append instead of overwrite 2024-10-24 09:57:20 +02:00
Giambattista Bloisi aa7b8fd014 Use workingDir parameter for temporary data of ORCID enrichment 2024-10-23 14:02:17 +02:00
Giambattista Bloisi 0e34b0ece1 Fix imports: point them from the main distribution packages 2024-10-23 14:01:52 +02:00
Miriam Baglioni aac5eb3499 [personEntity] changed the data info for the relations with projects. added missing parameters to the job.properties file 2024-10-22 11:54:16 +02:00
Miriam Baglioni 821540f94a [personEntity] updated the property file to include also the db parameters. The same for the wf definition. Refactoring for compilation 2024-10-22 10:13:30 +02:00
Miriam Baglioni 09a2c93fc7 [personEntity] added relations with projects extracting the info from the database 2024-10-21 16:21:15 +02:00
Miriam Baglioni ce4ee1189f [personEntity] create entity for each profile in orcid even without works. Added validated true to each relation coming from orcid data 2024-10-21 14:38:15 +02:00
Miriam Baglioni 2b27afaec8 [createASfromAffRo] refactoring after compilation 2024-10-18 16:22:51 +02:00
Miriam Baglioni 0e5dd14538 [createASfromAffRo] adding the provenance datasource used to get the relation (no datasource can be webcrawl = publisher, rawaff means oalex) 2024-10-18 16:22:21 +02:00
Claudio Atzori 62ff843334 adopting dhp-schemas:8.0.1 to support Auhtor's rawAffiliationString(s). Improved graph2hive implementation 2024-10-08 16:22:54 +02:00
Claudio Atzori d5867a1992 merged #490 2024-10-08 15:39:59 +02:00
Claudio Atzori e5df68772d [graph provision] fixed serialisation of the usage counts as measures in the XML records 2024-10-02 09:35:21 +02:00
Miriam Baglioni 7e6d12fa77 [UsageCount] fixed error
(cherry picked from commit 9c9a9562ae)
2024-10-01 15:55:07 +02:00
Miriam Baglioni 191fc3a461 [UsageCount] add check in case the datasource is not matched against those present in the graph
(cherry picked from commit b42bdd5fb3)
2024-10-01 15:54:31 +02:00
Claudio Atzori 10696f2a44 reverted procedure for creating the UsageCounts actionset 2024-10-01 15:54:13 +02:00
Claudio Atzori 5734b80861 Merge pull request 'datasource table creation split in steps' (#489) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #489
2024-09-30 16:34:38 +02:00
Antonis Lempesis f3c179658a datasource table creation split in steps 2024-09-30 17:12:21 +03:00
Miriam Baglioni b18ad035c1 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-09-30 15:10:44 +02:00
Miriam Baglioni e430826e00 [ImportOC] fix to move original folder instead of extracted ones 2024-09-30 15:10:10 +02:00
Claudio Atzori 3fcafc7ed6 Merge pull request 'Latest institutions in monitor dbs' (#472) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #472
2024-09-26 09:49:01 +02:00
Miriam Baglioni 599e56dbc6 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-09-25 17:28:23 +02:00
Claudio Atzori 6397141e56 code formatting 2024-09-25 15:27:32 +02:00
Claudio Atzori e354f9853a [OpenCitations] move the extracted contents under a backup path to avoid needing to re-download it in case of errors 2024-09-25 15:27:02 +02:00
Sandro La Bruzzo 6a097abc89 as described on ticket #9525
1. Changed the mapping applied to Crossref records: anything that has a relationship "is-review-of" must be mapped as publication of type "Review".
2. Force the hostedby of Crossref records with DOI prefix 10.3410 and 10.12703 to the H1 Connect data source.
2024-09-25 11:32:54 +02:00
Michele Artini 9754521847 Merge pull request 'fixed a bug with id' (#486) from osfPreprints_plugin into beta
Reviewed-on: #486
2024-09-25 10:02:24 +02:00
Michele Artini fa2532db30 fixed a bug with id 2024-09-25 09:38:50 +02:00
Michele Artini 54f8b4da39 Merge pull request 'fixed a bug with 'null' string' (#484) from osfPreprints_plugin into beta
Reviewed-on: #484
2024-09-24 15:19:54 +02:00
Michele Artini b35d046fd2 fixed a bug with 'null' string 2024-09-24 15:18:54 +02:00
Claudio Atzori 4f0463d779 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-24 14:54:34 +02:00
Miriam Baglioni 4d3e079590 Merge remote-tracking branch 'origin/beta' into beta 2024-09-24 14:26:29 +02:00
Claudio Atzori d1cadc77c9 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-24 10:57:20 +02:00
Michele Artini 0e89d4a1cf fixed a bug with topic ENRICH/MORE/SUBJECT/ARXIV 2024-09-24 08:57:49 +02:00
Michele Artini e941adbe2b fixed a bug with topic ENRICH/MORE/SUBJECT/ARXIV 2024-09-24 08:57:37 +02:00