Commit Graph

528 Commits

Author SHA1 Message Date
Claudio Atzori 1726f49790 code formatting 2023-12-15 10:37:02 +01:00
Claudio Atzori 33cb483c75 using objectSubType as originalType in Crossref2Oaf, code formatting 2023-12-01 15:03:05 +01:00
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 2023-12-01 12:28:14 +01:00
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 2023-11-30 15:27:46 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Sandro La Bruzzo 86b5775e08 added vocabulary in instanceTypeMapping for
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo af1c2634b3 added instanceTypeMapping original field in the mapping of
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo 6ce36b3e41 Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables 2023-11-14 12:04:29 +01:00
Claudio Atzori 8c03c41d5d applying changes from beta 2023-11-03 12:08:39 +01:00
Serafeim Chatzopoulos 7e34dde774 Renaming input param for crossref input path 2023-11-02 17:47:04 +02:00
Serafeim Chatzopoulos 24c3f92d87 Change the description of the workflow 2023-11-02 17:46:51 +02:00
Serafeim Chatzopoulos 6ce9b600c1 Add actionset creation for pubmed affiliations 2023-11-02 17:46:39 +02:00
Serafeim Chatzopoulos a82aaf57b2 Renaming input param for crossref input path 2023-10-25 12:05:02 -07:00
Serafeim Chatzopoulos aad5982bf1 Change the description of the workflow 2023-10-20 12:48:21 +03:00
Serafeim Chatzopoulos 6b19dcee80 Add actionset creation for pubmed affiliations 2023-10-19 19:58:25 +03:00
Claudio Atzori a460ebe215 [UnresolvedEntities] updated action name 2023-10-10 15:50:11 +02:00
Miriam Baglioni a431b04814 leftover for the properties and removal of bipfinder 2023-10-10 12:53:57 +02:00
Miriam Baglioni 110ce4b40f extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance 2023-10-10 09:46:40 +02:00
Claudio Atzori 84a58802ab [OC] using the common pid cleaning function 2023-10-06 14:48:05 +02:00
Claudio Atzori 46034630cf [OC] compress the output actionset 2023-10-06 14:42:02 +02:00
Claudio Atzori ee8a39e7d2 cleanup and refinements 2023-10-04 12:32:05 +02:00
Miriam Baglioni d7fccdc64b fixed paths in wf to match the req of the pathname 2023-10-02 14:10:57 +02:00
Miriam Baglioni 9898470b0e Addressing comments in #340\#issuecomment-10592 2023-10-02 12:54:16 +02:00
Miriam Baglioni e84f5b5e64 extended existing codo to accomodate import of POCI from open citation 2023-10-02 09:25:16 +02:00
Sandro La Bruzzo 9c3ab11d5b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2023-09-25 15:29:19 +02:00
Sandro La Bruzzo 423ef30676 minor fix on the aggregation of uniprot and pdb 2023-09-25 15:28:58 +02:00
Claudio Atzori 4786aa0e09 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:21:07 +02:00
Claudio Atzori 265180bfd2 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:20:35 +02:00
Claudio Atzori 15666e86a8 added collectedfrom to the affiliation relations imported from Crossref 2023-09-04 15:56:06 +02:00
Serafeim Chatzopoulos 7de0164c26 Fix import of affiliations relations from Crossref 2023-09-04 16:04:41 +03:00
Miriam Baglioni 9c8b41475a Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: #284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos 97c1ba8918 Merge actionsets of results and projects 2023-08-11 15:56:53 +03:00
Serafeim Chatzopoulos 7cefe2665b Remove unnecessary classes 2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos 26a92ce762 Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876 2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos ebfba38ab6 Add changes from code review 2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos eb8684a8cf Merge branch 'beta' into 8876 2023-07-28 13:39:33 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Serafeim Chatzopoulos 2cc5b1a39b Fixes in workflow.xml 2023-07-21 15:26:50 +03:00
Serafeim Chatzopoulos be320ba3c1 Indentation fixes 2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos bc1a4611aa Minor changes 2023-07-17 11:17:53 +03:00
Serafeim Chatzopoulos 4eba14a80e Add oozie workflow 2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos bc7b00bcd1 Add bi-directional affiliation relations 2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos 12528ed2ef Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions 2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos bbc245696e Prepare actionsets for BIP affiliations 2023-07-06 15:56:12 +03:00
Serafeim Chatzopoulos 347a889b20 Read affiliation relations 2023-07-06 00:51:01 +03:00
Miriam Baglioni 7738372125 [UsageCount] fixed typo in attribute name for datasource table 2023-06-30 18:56:41 +02:00
Michele Artini 88a1cbc37d fixed a datasource id 2023-06-22 07:56:33 +02:00
Michele Artini 009d7f312f fixed a datasource Id 2023-06-21 16:17:34 +02:00
Alessia Bardi d5be6a13e9 Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:43:32 +02:00
Alessia Bardi 118e72d7db Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:39:12 +02:00
Claudio Atzori db625e548d [UsageCount] addition of usagecount for Projects and datasources 2023-05-22 15:00:46 +02:00
Claudio Atzori 8acad52a0c Merge branch 'beta' into apc_affiliation 2023-05-15 15:47:33 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Miriam Baglioni 86fe886c1a removed the inverse of the Citing relation 2023-05-15 11:20:51 +02:00
Serafeim Chatzopoulos 815a4ddbba Add actionset creation for project bip indicators in workflow 2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos ee04cf92bf Add actionsets for project impact indicators 2023-04-26 20:23:46 +03:00
Miriam Baglioni d4fc62c2f6 mergin with branch beta 2023-03-02 11:14:54 +01:00
Miriam Baglioni de8ad1caef [ECclassification] new implementation for the H2020 classification 2023-03-02 11:14:03 +01:00
Miriam Baglioni c1f9848953 [ECclassification] added new classes 2023-03-01 15:29:11 +01:00
Claudio Atzori 16ad42e8f3 code formatting 2023-03-01 10:22:13 +01:00
Miriam Baglioni 4f2df876cd [ECclassification] new implementation first try 2023-02-28 14:44:00 +01:00
Claudio Atzori 2f7346e9cf WIP monodirectional citations, Datacite 2023-02-28 13:30:51 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Miriam Baglioni 80987801d7 [FoS] added check for null on level1 subject 2023-02-27 11:40:22 +01:00
Claudio Atzori 31e97c2a6b [unresolved entities] updated oozie wf node labels 2023-02-27 11:38:29 +01:00
Miriam Baglioni 23112929e9 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-27 10:18:39 +01:00
Claudio Atzori 0c1be41b30 code formatting 2023-02-22 10:15:25 +01:00
Claudio Atzori 477a7c416f Merge branch 'beta' into UsageCountOnProjectAndDatasource 2023-02-22 09:55:51 +01:00
Miriam Baglioni 016337a0f9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-02-16 15:54:59 +01:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Miriam Baglioni 32870339f5 refactoring after compile 2023-02-13 13:06:48 +01:00
Miriam Baglioni 7184cc0804 [FoS] added check for null on level1 subject 2023-02-13 13:03:49 +01:00
Miriam Baglioni 5cf902a2b0 [UsageCount] changed query to make the sum be computed via sql instead of grouping 2023-02-10 16:16:37 +01:00
Miriam Baglioni f803530df6 [UsageCount] fixed query 2023-02-10 15:50:56 +01:00
Miriam Baglioni 7473093c84 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-10 15:34:52 +01:00
Miriam Baglioni 85e53fad00 [UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class 2023-02-09 18:59:45 +01:00
Claudio Atzori f86e19b282 code formatting 2023-01-11 09:53:19 +01:00
Sandro La Bruzzo 3c9826f186 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:21:17 +01:00
Sandro La Bruzzo 91c70b15a5 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:14:42 +01:00
Sandro La Bruzzo 72f0d88d6c formatted code 2022-10-19 14:18:42 +02:00
Sandro La Bruzzo a1f94530a3 added documentation 2022-10-13 11:47:11 +02:00
Claudio Atzori 27a91841e7 WIP: cleaning of subjects 2022-08-04 11:39:39 +02:00
Claudio Atzori eb53b52f7c code formatting 2022-08-02 13:24:47 +02:00
Claudio Atzori 209c7e9dab [datacite] avoid UnsupportedOperationException 2022-08-01 09:05:35 +02:00
Claudio Atzori 92e48f12f7 [metadata collection] updated collector plugin name 2022-07-29 13:54:00 +02:00
Claudio Atzori f62c4e05cd code formatting 2022-07-29 11:56:01 +02:00
Claudio Atzori ed98a6d9d0 [Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[] 2022-07-28 10:15:14 +02:00
Sandro La Bruzzo 0a4f4d98fa added PMCId to PmArticle 2022-07-13 15:27:17 +02:00
Claudio Atzori 929b145130 code formatting 2022-06-21 23:07:06 +02:00
Claudio Atzori 06b5533d4c Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-06-16 09:22:16 +02:00
Alessia Bardi 88d531dc91 exclude FAIRsharing records from Datacite 2022-06-13 16:17:17 +02:00
Claudio Atzori b8cda65487 code formatting 2022-06-13 09:20:03 +02:00
Michele Artini 634869ce95 deleted hierarchical rels from ror action set 2022-06-13 09:12:21 +02:00
Claudio Atzori d098ad0d93 [hb patch] updated map 2022-05-16 15:54:04 +02:00
Miriam Baglioni 89657a0b78 [UsageCount] refactoring 2022-05-09 14:43:27 +02:00
Miriam Baglioni a056f59c6e [UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now 2022-05-09 12:51:35 +02:00
Serafeim Chatzopoulos 623f7be26d Fix reading files from HDFS in FileCollector & FileGZipCollector plugins 2022-04-28 16:31:11 +03:00
Claudio Atzori 30105f0722 Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-22 11:22:21 +02:00
Miriam Baglioni 20de75ca64 [Measures] removed typo 2022-04-21 12:14:03 +02:00
Miriam Baglioni b61efd613b [Measures] addressed comments in the PR 2022-04-21 12:09:37 +02:00