Commit Graph

733 Commits

Author SHA1 Message Date
Michele Artini 455f2e1e07 apply commits from master 2024-03-15 14:56:39 +01:00
Michele Artini 88fef367b9 new plugin to collect from a dump of BASE 2024-03-15 10:47:52 +01:00
Sandro La Bruzzo 5281f010a5 applied cherry pick 2024-03-13 09:59:20 +01:00
Sandro La Bruzzo ee1fcb672b code refactor 2024-03-13 09:46:31 +01:00
Miriam Baglioni 5a32bb9578 [OC New] last fix 2024-03-13 09:36:18 +01:00
Sandro La Bruzzo c532831718 Moved Crossref Mapping on dhp-aggregations,
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Miriam Baglioni 48c052215c [OC New] last fix 2024-03-12 23:12:32 +01:00
Sandro La Bruzzo cbd4e5e4bb update mag mapping 2024-03-08 16:31:40 +01:00
Miriam Baglioni 5180b6ec8a [FOSNEW] removed test class 2024-03-07 10:47:13 +01:00
Miriam Baglioni 7827a2d66b [OCNEW] added creation of the actionset for the results classified with FoS based ont he OpenAIRE identifier 2024-03-07 10:36:30 +01:00
Miriam Baglioni fd34372c40 [OCNEW] first implementation 2024-03-06 13:42:00 +01:00
Sandro La Bruzzo d34cef3f8d Merge remote-tracking branch 'origin/beta' into doidoost_dismiss 2024-03-05 11:45:31 +01:00
Sandro La Bruzzo 3b837d38ce added oozie workflow 2024-03-05 11:44:59 +01:00
Sandro La Bruzzo f417515e43 Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source 2024-03-04 17:15:13 +01:00
Sandro La Bruzzo ad0e9aa80c added first part of refactoring of the code generating MAG,
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Giambattista Bloisi 3cd5590f3b When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped 2024-02-28 15:14:18 +01:00
Giambattista Bloisi 56dd05f85c Merge pull request 'Revised procedure when converting json data into xml' (#395) from restiterator_xmlcleanup into beta
Reviewed-on: #395
2024-02-28 10:38:54 +01:00
Sandro La Bruzzo 7d806a434c formatted code 2024-02-28 09:31:58 +01:00
Sandro La Bruzzo 915a76a796 following the comment on the pull requests:
- Added #NUM_OF_THREADS complete job in the queue at the end of  the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi 773e856550 Revised procedure when converting json data into xml:
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo a712df1e1d Merge remote-tracking branch 'origin/beta' into orcid_update 2024-02-23 10:12:25 +01:00
Sandro La Bruzzo b32a9d1994 Implemented workflow for updating table , added step to check if the new generated table is valid 2024-02-23 10:04:28 +01:00
Miriam Baglioni 72bae7af76 [Transformative Agreement] removed the relations from the ActionSet waiting to have the gree light from Ioanna 2024-02-19 16:20:12 +01:00
Serafeim Chatzopoulos f0dc12634b Add Action Set creation for affiliations inferred from the OpenAPC data 2024-02-18 18:02:09 +02:00
Miriam Baglioni eca021f4d6 [Transformative Agreement] add results with information abount the agreement and the country of the organization paid for it 2024-02-13 12:21:07 +01:00
Miriam Baglioni bdb6bbb365 mergin with branch beta 2024-02-12 15:50:43 +01:00
Miriam Baglioni 07a373a0bd [bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set 2024-01-30 13:51:11 +01:00
Miriam Baglioni a418dacb47 [UsageCount] code extention to include also the name of the datasource 2024-01-29 18:12:33 +01:00
Miriam Baglioni e9131f4e4a mergin with branch beta 2024-01-29 16:27:18 +01:00
Sandro La Bruzzo 9aebca77a0 Added exception throwing in Hadoop transformation when TR is not syntactically valid 2024-01-29 14:41:02 +01:00
Sandro La Bruzzo 0386f36385 Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one. 2024-01-25 19:40:59 +01:00
Sandro La Bruzzo 43e0bba7ed logg added during download 2024-01-23 15:04:49 +01:00
Miriam Baglioni f7d06dc661 compilation after merging 2024-01-23 11:43:08 +01:00
Miriam Baglioni e0ec800d7e [BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint 2024-01-23 11:34:53 +01:00
Sandro La Bruzzo e0753f19da Fixed error of connection timeout 2024-01-13 09:27:08 +01:00
sandro.labruzzo e328bc0ade fixed missing parameter on download update 2024-01-12 16:18:20 +01:00
Miriam Baglioni f612125939 fix issue on FoS integration. Removing the null values from FoS 2024-01-12 10:20:28 +01:00
Sandro La Bruzzo 859babf722 added some useful comment 2024-01-10 19:51:13 +01:00
Sandro La Bruzzo 8f61063201 Added workflow 2024-01-10 19:42:22 +01:00
Sandro La Bruzzo 1a42a5c10d Implemented Download update of ORCID 2024-01-10 18:03:20 +01:00
Miriam Baglioni 624f5f3f21 [Transformative Agreement] added check to verify the APC were paid byu the IReL funder 2023-12-18 15:28:19 +01:00
Miriam Baglioni 354e02e6a9 [Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv 2023-12-18 15:20:27 +01:00
Miriam Baglioni b00771c7cc [Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC 2023-12-18 15:12:44 +01:00
Sandro La Bruzzo 15fd93a2b6 uploaded input parameters on CreateBaseline WF 2023-12-18 12:21:55 +01:00
Sandro La Bruzzo 9d342a47da updated the transformation Baseline workflow to include mdstore rollback/commit action 2023-12-18 11:48:57 +01:00
Claudio Atzori 33cb483c75 using objectSubType as originalType in Crossref2Oaf, code formatting 2023-12-01 15:03:05 +01:00
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 2023-12-01 12:28:14 +01:00
Sandro La Bruzzo cdfb7588dd code formatting 2023-11-30 15:31:42 +01:00
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 2023-11-30 15:27:46 +01:00
Claudio Atzori 6f10791e77 Merge branch 'beta' into propagationapi 2023-11-30 14:20:18 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Sandro La Bruzzo 86b5775e08 added vocabulary in instanceTypeMapping for
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo af1c2634b3 added instanceTypeMapping original field in the mapping of
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Miriam Baglioni 8eb70e6657 refactoring 2023-11-27 15:13:15 +01:00
Sandro La Bruzzo 34a4b3cbdf Implemented ORCID Enrichment 2023-11-24 12:39:58 +01:00
Sandro La Bruzzo 6ce36b3e41 Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables 2023-11-14 12:04:29 +01:00
Serafeim Chatzopoulos 2090003ea9 Adjust tests to new WF input params 2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos a82aaf57b2 Renaming input param for crossref input path 2023-10-25 12:05:02 -07:00
Serafeim Chatzopoulos aad5982bf1 Change the description of the workflow 2023-10-20 12:48:21 +03:00
Serafeim Chatzopoulos 6b19dcee80 Add actionset creation for pubmed affiliations 2023-10-19 19:58:25 +03:00
Claudio Atzori a460ebe215 [UnresolvedEntities] updated action name 2023-10-10 15:50:11 +02:00
Miriam Baglioni a431b04814 leftover for the properties and removal of bipfinder 2023-10-10 12:53:57 +02:00
Miriam Baglioni 110ce4b40f extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance 2023-10-10 09:46:40 +02:00
Claudio Atzori 84a58802ab [OC] using the common pid cleaning function 2023-10-06 14:48:05 +02:00
Claudio Atzori 46034630cf [OC] compress the output actionset 2023-10-06 14:42:02 +02:00
Claudio Atzori ee8a39e7d2 cleanup and refinements 2023-10-04 12:32:05 +02:00
Miriam Baglioni d7fccdc64b fixed paths in wf to match the req of the pathname 2023-10-02 14:10:57 +02:00
Miriam Baglioni 9898470b0e Addressing comments in #340\#issuecomment-10592 2023-10-02 12:54:16 +02:00
Miriam Baglioni e84f5b5e64 extended existing codo to accomodate import of POCI from open citation 2023-10-02 09:25:16 +02:00
Claudio Atzori 4786aa0e09 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:21:07 +02:00
Claudio Atzori 15666e86a8 added collectedfrom to the affiliation relations imported from Crossref 2023-09-04 15:56:06 +02:00
Serafeim Chatzopoulos 7de0164c26 Fix import of affiliations relations from Crossref 2023-09-04 16:04:41 +03:00
Miriam Baglioni 9c8b41475a Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: #284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos 97c1ba8918 Merge actionsets of results and projects 2023-08-11 15:56:53 +03:00
Serafeim Chatzopoulos 7cefe2665b Remove unnecessary classes 2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos 26a92ce762 Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876 2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos ebfba38ab6 Add changes from code review 2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos eb8684a8cf Merge branch 'beta' into 8876 2023-07-28 13:39:33 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Serafeim Chatzopoulos 2cc5b1a39b Fixes in workflow.xml 2023-07-21 15:26:50 +03:00
Serafeim Chatzopoulos be320ba3c1 Indentation fixes 2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos bc1a4611aa Minor changes 2023-07-17 11:17:53 +03:00
Serafeim Chatzopoulos 4eba14a80e Add oozie workflow 2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos c2998a14e8 Add basic tests for affiliation relations 2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos bc7b00bcd1 Add bi-directional affiliation relations 2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos 12528ed2ef Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions 2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos bbc245696e Prepare actionsets for BIP affiliations 2023-07-06 15:56:12 +03:00
Serafeim Chatzopoulos 347a889b20 Read affiliation relations 2023-07-06 00:51:01 +03:00
Miriam Baglioni 4c9bc4c3a5 refactoring 2023-06-30 19:05:15 +02:00
Miriam Baglioni 7738372125 [UsageCount] fixed typo in attribute name for datasource table 2023-06-30 18:56:41 +02:00
Miriam Baglioni 55ea485783 [UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value 2023-06-30 18:39:30 +02:00
Michele Artini 88a1cbc37d fixed a datasource id 2023-06-22 07:56:33 +02:00
Alessia Bardi d5be6a13e9 Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:43:32 +02:00
Claudio Atzori 8acad52a0c Merge branch 'beta' into apc_affiliation 2023-05-15 15:47:33 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Miriam Baglioni 78b07400c0 changed test classes 2023-05-15 11:37:08 +02:00
Miriam Baglioni 86fe886c1a removed the inverse of the Citing relation 2023-05-15 11:20:51 +02:00
Serafeim Chatzopoulos 815a4ddbba Add actionset creation for project bip indicators in workflow 2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos ee04cf92bf Add actionsets for project impact indicators 2023-04-26 20:23:46 +03:00