1
0
Fork 0
Commit Graph

875 Commits

Author SHA1 Message Date
Miriam Baglioni fd34372c40 [OCNEW] first implementation 2024-03-06 13:42:00 +01:00
Sandro La Bruzzo d34cef3f8d Merge remote-tracking branch 'origin/beta' into doidoost_dismiss 2024-03-05 11:45:31 +01:00
Sandro La Bruzzo 3b837d38ce added oozie workflow 2024-03-05 11:44:59 +01:00
Sandro La Bruzzo f417515e43 Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source 2024-03-04 17:15:13 +01:00
Sandro La Bruzzo ad0e9aa80c added first part of refactoring of the code generating MAG,
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Giambattista Bloisi 3cd5590f3b When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped 2024-02-28 15:14:18 +01:00
Giambattista Bloisi 56dd05f85c Merge pull request 'Revised procedure when converting json data into xml' (#395) from restiterator_xmlcleanup into beta
Reviewed-on: D-Net/dnet-hadoop#395
2024-02-28 10:38:54 +01:00
Sandro La Bruzzo 7d806a434c formatted code 2024-02-28 09:31:58 +01:00
Sandro La Bruzzo 915a76a796 following the comment on the pull requests:
- Added #NUM_OF_THREADS complete job in the queue at the end of  the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi 773e856550 Revised procedure when converting json data into xml:
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo a712df1e1d Merge remote-tracking branch 'origin/beta' into orcid_update 2024-02-23 10:12:25 +01:00
Sandro La Bruzzo b32a9d1994 Implemented workflow for updating table , added step to check if the new generated table is valid 2024-02-23 10:04:28 +01:00
Miriam Baglioni 72bae7af76 [Transformative Agreement] removed the relations from the ActionSet waiting to have the gree light from Ioanna 2024-02-19 16:20:12 +01:00
Serafeim Chatzopoulos f0dc12634b Add Action Set creation for affiliations inferred from the OpenAPC data 2024-02-18 18:02:09 +02:00
Miriam Baglioni eca021f4d6 [Transformative Agreement] add results with information abount the agreement and the country of the organization paid for it 2024-02-13 12:21:07 +01:00
Miriam Baglioni bdb6bbb365 mergin with branch beta 2024-02-12 15:50:43 +01:00
Miriam Baglioni 07a373a0bd [bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set 2024-01-30 13:51:11 +01:00
Miriam Baglioni a418dacb47 [UsageCount] code extention to include also the name of the datasource 2024-01-29 18:12:33 +01:00
Miriam Baglioni e9131f4e4a mergin with branch beta 2024-01-29 16:27:18 +01:00
Sandro La Bruzzo 9aebca77a0 Added exception throwing in Hadoop transformation when TR is not syntactically valid 2024-01-29 14:41:02 +01:00
Sandro La Bruzzo 0386f36385 Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one. 2024-01-25 19:40:59 +01:00
Sandro La Bruzzo 43e0bba7ed logg added during download 2024-01-23 15:04:49 +01:00
Miriam Baglioni f7d06dc661 compilation after merging 2024-01-23 11:43:08 +01:00
Miriam Baglioni e0ec800d7e [BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint 2024-01-23 11:34:53 +01:00
Sandro La Bruzzo e0753f19da Fixed error of connection timeout 2024-01-13 09:27:08 +01:00
sandro.labruzzo e328bc0ade fixed missing parameter on download update 2024-01-12 16:18:20 +01:00
Miriam Baglioni f612125939 fix issue on FoS integration. Removing the null values from FoS 2024-01-12 10:20:28 +01:00
Sandro La Bruzzo 859babf722 added some useful comment 2024-01-10 19:51:13 +01:00
Sandro La Bruzzo 8f61063201 Added workflow 2024-01-10 19:42:22 +01:00
Sandro La Bruzzo 1a42a5c10d Implemented Download update of ORCID 2024-01-10 18:03:20 +01:00
Miriam Baglioni 624f5f3f21 [Transformative Agreement] added check to verify the APC were paid byu the IReL funder 2023-12-18 15:28:19 +01:00
Miriam Baglioni 354e02e6a9 [Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv 2023-12-18 15:20:27 +01:00
Miriam Baglioni b00771c7cc [Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC 2023-12-18 15:12:44 +01:00
Sandro La Bruzzo 15fd93a2b6 uploaded input parameters on CreateBaseline WF 2023-12-18 12:21:55 +01:00
Sandro La Bruzzo 9d342a47da updated the transformation Baseline workflow to include mdstore rollback/commit action 2023-12-18 11:48:57 +01:00
Giambattista Bloisi 613ec5ffce Add profiles for different spark versions: spark-24, spark-34, spark-35 2023-12-05 19:11:06 +01:00
Sandro La Bruzzo 52495f2cd2 used javax.xml.stream.XMLEventReader instead of deprecated scala.xml.pull.XMLEventReader 2023-12-05 19:11:06 +01:00
Claudio Atzori 33cb483c75 using objectSubType as originalType in Crossref2Oaf, code formatting 2023-12-01 15:03:05 +01:00
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 2023-12-01 12:28:14 +01:00
Sandro La Bruzzo cdfb7588dd code formatting 2023-11-30 15:31:42 +01:00
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 2023-11-30 15:27:46 +01:00
Claudio Atzori 6f10791e77 Merge branch 'beta' into propagationapi 2023-11-30 14:20:18 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Sandro La Bruzzo 86b5775e08 added vocabulary in instanceTypeMapping for
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo af1c2634b3 added instanceTypeMapping original field in the mapping of
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Miriam Baglioni 8eb70e6657 refactoring 2023-11-27 15:13:15 +01:00
Sandro La Bruzzo 34a4b3cbdf Implemented ORCID Enrichment 2023-11-24 12:39:58 +01:00
Sandro La Bruzzo 6ce36b3e41 Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables 2023-11-14 12:04:29 +01:00
Serafeim Chatzopoulos 2090003ea9 Adjust tests to new WF input params 2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos a82aaf57b2 Renaming input param for crossref input path 2023-10-25 12:05:02 -07:00
Serafeim Chatzopoulos aad5982bf1 Change the description of the workflow 2023-10-20 12:48:21 +03:00
Serafeim Chatzopoulos 6b19dcee80 Add actionset creation for pubmed affiliations 2023-10-19 19:58:25 +03:00
Claudio Atzori a460ebe215 [UnresolvedEntities] updated action name 2023-10-10 15:50:11 +02:00
Miriam Baglioni a431b04814 leftover for the properties and removal of bipfinder 2023-10-10 12:53:57 +02:00
Miriam Baglioni 110ce4b40f extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance 2023-10-10 09:46:40 +02:00
Claudio Atzori 84a58802ab [OC] using the common pid cleaning function 2023-10-06 14:48:05 +02:00
Claudio Atzori 46034630cf [OC] compress the output actionset 2023-10-06 14:42:02 +02:00
Claudio Atzori ee8a39e7d2 cleanup and refinements 2023-10-04 12:32:05 +02:00
Miriam Baglioni d7fccdc64b fixed paths in wf to match the req of the pathname 2023-10-02 14:10:57 +02:00
Miriam Baglioni 9898470b0e Addressing comments in D-Net/dnet-hadoop#340\#issuecomment-10592 2023-10-02 12:54:16 +02:00
Miriam Baglioni e84f5b5e64 extended existing codo to accomodate import of POCI from open citation 2023-10-02 09:25:16 +02:00
Claudio Atzori 4786aa0e09 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:21:07 +02:00
Claudio Atzori 15666e86a8 added collectedfrom to the affiliation relations imported from Crossref 2023-09-04 15:56:06 +02:00
Serafeim Chatzopoulos 7de0164c26 Fix import of affiliations relations from Crossref 2023-09-04 16:04:41 +03:00
Miriam Baglioni 9c8b41475a Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos 97c1ba8918 Merge actionsets of results and projects 2023-08-11 15:56:53 +03:00
Serafeim Chatzopoulos 7cefe2665b Remove unnecessary classes 2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos 26a92ce762 Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876 2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos ebfba38ab6 Add changes from code review 2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos eb8684a8cf Merge branch 'beta' into 8876 2023-07-28 13:39:33 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Serafeim Chatzopoulos 2cc5b1a39b Fixes in workflow.xml 2023-07-21 15:26:50 +03:00
Serafeim Chatzopoulos be320ba3c1 Indentation fixes 2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos bc1a4611aa Minor changes 2023-07-17 11:17:53 +03:00
Serafeim Chatzopoulos 4eba14a80e Add oozie workflow 2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos c2998a14e8 Add basic tests for affiliation relations 2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos bc7b00bcd1 Add bi-directional affiliation relations 2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos 12528ed2ef Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions 2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos bbc245696e Prepare actionsets for BIP affiliations 2023-07-06 15:56:12 +03:00
Serafeim Chatzopoulos 347a889b20 Read affiliation relations 2023-07-06 00:51:01 +03:00
Miriam Baglioni 4c9bc4c3a5 refactoring 2023-06-30 19:05:15 +02:00
Miriam Baglioni 7738372125 [UsageCount] fixed typo in attribute name for datasource table 2023-06-30 18:56:41 +02:00
Miriam Baglioni 55ea485783 [UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value 2023-06-30 18:39:30 +02:00
Michele Artini 88a1cbc37d fixed a datasource id 2023-06-22 07:56:33 +02:00
Alessia Bardi d5be6a13e9 Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:43:32 +02:00
Claudio Atzori 8acad52a0c Merge branch 'beta' into apc_affiliation 2023-05-15 15:47:33 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Miriam Baglioni 78b07400c0 changed test classes 2023-05-15 11:37:08 +02:00
Miriam Baglioni 86fe886c1a removed the inverse of the Citing relation 2023-05-15 11:20:51 +02:00
Serafeim Chatzopoulos 815a4ddbba Add actionset creation for project bip indicators in workflow 2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos ee04cf92bf Add actionsets for project impact indicators 2023-04-26 20:23:46 +03:00
Claudio Atzori 24e2fd828b code formatting 2023-03-08 21:17:08 +01:00
Miriam Baglioni 0fff98a14c [ECclassification] removed print 2023-03-02 11:46:57 +01:00
Miriam Baglioni b0c2f7e526 [ECclassification] removed not needed resources 2023-03-02 11:44:48 +01:00
Miriam Baglioni d4fc62c2f6 mergin with branch beta 2023-03-02 11:14:54 +01:00
Miriam Baglioni de8ad1caef [ECclassification] new implementation for the H2020 classification 2023-03-02 11:14:03 +01:00
Miriam Baglioni c1f9848953 [ECclassification] added new classes 2023-03-01 15:29:11 +01:00
Claudio Atzori 16ad42e8f3 code formatting 2023-03-01 10:22:13 +01:00
Miriam Baglioni 4f2df876cd [ECclassification] new implementation first try 2023-02-28 14:44:00 +01:00
Claudio Atzori 2f7346e9cf WIP monodirectional citations, Datacite 2023-02-28 13:30:51 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Miriam Baglioni 80987801d7 [FoS] added check for null on level1 subject 2023-02-27 11:40:22 +01:00
Claudio Atzori 31e97c2a6b [unresolved entities] updated oozie wf node labels 2023-02-27 11:38:29 +01:00
Miriam Baglioni 23112929e9 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-27 10:18:39 +01:00
Claudio Atzori 0c1be41b30 code formatting 2023-02-22 10:15:25 +01:00
Claudio Atzori 477a7c416f Merge branch 'beta' into UsageCountOnProjectAndDatasource 2023-02-22 09:55:51 +01:00
Miriam Baglioni 016337a0f9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-02-16 15:54:59 +01:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Miriam Baglioni 5cf902a2b0 [UsageCount] changed query to make the sum be computed via sql instead of grouping 2023-02-10 16:16:37 +01:00
Miriam Baglioni f803530df6 [UsageCount] fixed query 2023-02-10 15:50:56 +01:00
Miriam Baglioni bb5bba51b3 [UsageCount] extended test 2023-02-09 19:08:30 +01:00
Miriam Baglioni 85e53fad00 [UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class 2023-02-09 18:59:45 +01:00
Michele Artini 7b7520850b fixed an invalid char 2023-01-11 09:22:18 +01:00
Sandro La Bruzzo 3c9826f186 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:21:17 +01:00
Sandro La Bruzzo 72f0d88d6c formatted code 2022-10-19 14:18:42 +02:00
Sandro La Bruzzo a1f94530a3 added documentation 2022-10-13 11:47:11 +02:00
Claudio Atzori f3f7604e6c trying to fix a test that fails only on Jenkins 2022-09-27 15:21:37 +02:00
Claudio Atzori 27a91841e7 WIP: cleaning of subjects 2022-08-04 11:39:39 +02:00
Claudio Atzori eb53b52f7c code formatting 2022-08-02 13:24:47 +02:00
Claudio Atzori 209c7e9dab [datacite] avoid UnsupportedOperationException 2022-08-01 09:05:35 +02:00
Claudio Atzori 92e48f12f7 [metadata collection] updated collector plugin name 2022-07-29 13:54:00 +02:00
Claudio Atzori f62c4e05cd code formatting 2022-07-29 11:56:01 +02:00
Claudio Atzori ed98a6d9d0 [Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[] 2022-07-28 10:15:14 +02:00
Sandro La Bruzzo 67525076ec fixed test, now it compiles after commit a6977197b3 2022-07-26 15:35:17 +02:00
Claudio Atzori d43663d30f adapted RorActionSet test, it should not create parent/child rels 2022-07-25 17:54:10 +02:00
Claudio Atzori c3ede1b379 Merge branch 'beta' into pubmed_update 2022-07-25 14:10:22 +02:00
Claudio Atzori 1138b2ac8e code formatting 2022-07-19 14:15:49 +02:00
Sandro La Bruzzo 00168303db Added unit test to verify the generation in the OriginalID the old openaire Identifier generated by OAI 2022-07-14 10:19:59 +02:00
Sandro La Bruzzo 0a4f4d98fa added PMCId to PmArticle 2022-07-13 15:27:17 +02:00
Claudio Atzori 0cb1c70788 code formatting 2022-07-01 10:44:08 +02:00
Claudio Atzori 929b145130 code formatting 2022-06-21 23:07:06 +02:00
Claudio Atzori 06b5533d4c Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-06-16 09:22:16 +02:00
Alessia Bardi 88d531dc91 exclude FAIRsharing records from Datacite 2022-06-13 16:17:17 +02:00
Claudio Atzori b8cda65487 code formatting 2022-06-13 09:20:03 +02:00
Michele Artini 634869ce95 deleted hierarchical rels from ror action set 2022-06-13 09:12:21 +02:00
Michele Artini b94a791bc5 unit tests to transform cnr explora 2022-06-09 12:25:34 +02:00
Claudio Atzori d098ad0d93 [hb patch] updated map 2022-05-16 15:54:04 +02:00
Claudio Atzori 1dda11e031 [hb patch] updated map 2022-05-16 15:53:27 +02:00
Claudio Atzori 8dd5517548 code formatting 2022-05-16 14:35:24 +02:00
Michele Artini 46c07e0724 deleted hierarchical rels from ror action set 2022-05-16 09:39:54 +02:00
Miriam Baglioni 89657a0b78 [UsageCount] refactoring 2022-05-09 14:43:27 +02:00
Miriam Baglioni a056f59c6e [UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now 2022-05-09 12:51:35 +02:00
Serafeim Chatzopoulos 623f7be26d Fix reading files from HDFS in FileCollector & FileGZipCollector plugins 2022-04-28 16:31:11 +03:00
Claudio Atzori 30105f0722 Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-22 11:22:21 +02:00
Miriam Baglioni 20de75ca64 [Measures] removed typo 2022-04-21 12:14:03 +02:00
Miriam Baglioni b61efd613b [Measures] addressed comments in the PR 2022-04-21 12:09:37 +02:00
Claudio Atzori eabb40fccc Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-21 11:42:43 +02:00
Miriam Baglioni c304657d91 [Measures] put the logic in common, no need to change the schema 2022-04-21 11:27:26 +02:00
Miriam Baglioni 5295effc96 [Measures] fixed issue 2022-04-20 16:20:40 +02:00
Miriam Baglioni 5feae77937 [Measures] last changes to accomodate tests 2022-04-20 15:13:09 +02:00
Miriam Baglioni 869407c6e2 [Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587 2022-04-20 14:02:05 +02:00
Serafeim Chatzopoulos d0b84d3297 Add FileCollectorPlugin and respective test 2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos bc1bf55507 Add AbstractSplittedRecordPlugin 2022-04-07 14:33:04 +03:00
Claudio Atzori c26222623f [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
Claudio Atzori 86585a6b27 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:32:19 +02:00
Claudio Atzori ad85d88eaf [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 13:28:35 +02:00
Claudio Atzori 598e11dfd7 [maven-release-plugin] prepare for next development iteration 2022-04-07 13:27:02 +02:00
Claudio Atzori db3d9877a5 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:26:58 +02:00
Claudio Atzori 3bba6d6e38 [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 12:23:17 +02:00
Claudio Atzori 2ac2d928bd [maven-release-plugin] prepare for next development iteration 2022-04-07 12:18:47 +02:00
Claudio Atzori 85bc722ff4 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 12:18:43 +02:00
Claudio Atzori bc05b6168a [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 11:49:06 +02:00
Claudio Atzori 505420fd61 [maven-release-plugin] prepare for next development iteration 2022-04-07 11:34:06 +02:00
Claudio Atzori 66e718981e [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos e612489670 Add fileGZip collector plugin and respective test 2022-04-06 19:12:44 +03:00
Claudio Atzori 8c457f1b2c conflicts resolved, merged from beta 2022-04-06 10:27:52 +02:00
Miriam Baglioni e77d104951 [OC] added / to workflow path 2022-04-05 15:07:11 +02:00
Sandro La Bruzzo 1b11010169 minor fix 2022-03-29 10:59:14 +02:00
Claudio Atzori a87c070447 conflicts resolved, merged from beta 2022-02-24 12:51:31 +01:00
Claudio Atzori 5226d0a100 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-02-18 15:21:07 +01:00
Claudio Atzori 401dd38074 code formatting 2022-02-18 15:19:15 +01:00
Sandro La Bruzzo 891781ee3f Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-02-18 11:11:32 +01:00
Sandro La Bruzzo d3f03abd51 fixed wrong json path 2022-02-18 11:11:17 +01:00
Claudio Atzori 89c7313fc5 Merge branch 'beta' into hierarchical_orgs_relations 2022-02-17 10:30:04 +01:00
Sandro La Bruzzo 3aa2020b24 added script to regenerate hostedBy Map following instruction defined on ticket #7539
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni be64055cfe [OpenCitation] changed the name of destination folders 2022-02-14 15:49:44 +01:00
Miriam Baglioni 1490867cc7 [OpenCitation] cleaning of the COCI model 2022-02-14 14:52:12 +01:00
Miriam Baglioni 5c4043dba8 [OpenCitation] refactoring 2022-02-08 16:23:05 +01:00
Miriam Baglioni 759ed519f2 [OpenCitation] added logic to avoid the genration of self citations relations 2022-02-08 16:15:34 +01:00
Miriam Baglioni b071f8e415 [OpenCitation] change to extract in json format each folder just onece 2022-02-08 15:37:28 +01:00
Miriam Baglioni fbc28ee8c3 [OpenCitation] change the integration logic to consider dois with commas inside 2022-02-07 18:32:08 +01:00
Miriam Baglioni 493caef358 [stats-wf]fixed the result_result table related to PR#191 2022-02-04 14:51:25 +01:00
Miriam Baglioni 73eba34d42 [UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects 2022-02-01 08:38:41 +01:00
Claudio Atzori b37bc277c4 reintroduced the hostedby patching to the datacite records 2022-01-21 09:15:13 +01:00
Miriam Baglioni a75fb8c47a [BipFinderInstanceLevel] change pom to align to the dhp-schema release 2.10.24 and refactoring 2022-01-12 18:06:26 +01:00
Miriam Baglioni e7d5a39c03 [BipFinderInstanceLevel] added tests in test class 2022-01-12 17:25:04 +01:00
Miriam Baglioni 4993666d73 [BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid 2022-01-12 16:53:47 +01:00
Sandro La Bruzzo 57e2c4b749 formatted code 2022-01-12 09:40:28 +01:00
Claudio Atzori dcd282977c pulled from beta 2022-01-11 16:59:41 +01:00
Claudio Atzori 4f212652ca scalafmt: code formatting 2022-01-11 16:57:48 +01:00
Miriam Baglioni b7e450070b [SDG-FOS] to import SDG file not considering the header 2022-01-07 12:13:26 +01:00
Miriam Baglioni 639190370a mergin with branch beta 2022-01-07 11:29:25 +01:00
Miriam Baglioni adccc2346a [SDG-FOS] to lower case for the doi 2022-01-07 11:28:50 +01:00
Claudio Atzori 3bd3653be9 OAF-store-graph mdstores: save them in text format 2022-01-04 16:39:39 +01:00
Claudio Atzori 3dc48c7ab5 OAF-store-graph mdstores: save them in text format 2022-01-04 16:39:27 +01:00
Claudio Atzori f82db765db OAF-store-graph mdstores: save them in text format 2022-01-04 16:39:15 +01:00
Claudio Atzori 9458ee7938 serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema 2022-01-04 16:38:09 +01:00
Claudio Atzori 58f8998e3d OAF-store-graph mdstores: save them in text format 2022-01-04 15:02:09 +01:00
Claudio Atzori 174c3037e1 OAF-store-graph mdstores: save them in text format 2022-01-04 14:40:16 +01:00
Claudio Atzori 045d767013 OAF-store-graph mdstores: save them in text format 2022-01-04 14:23:01 +01:00
Claudio Atzori a6977197b3 serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema 2022-01-03 17:25:26 +01:00
Miriam Baglioni 92fd69e25d [SDG-FOS] alternative way to get input data to avoid OOM error while getting csv 2022-01-03 15:23:06 +01:00
Miriam Baglioni 7a1b440413 [SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects) 2021-12-23 13:24:28 +01:00
Miriam Baglioni 2a67ee13ec [SDG] added model class 2021-12-23 10:37:52 +01:00
Miriam Baglioni 10579c0dd0 [FOS]fixed doi value in test 2021-12-22 23:10:16 +01:00
Miriam Baglioni 6116fc5d40 [FOS]added logic to include only different subjects. Test refactoring and extention 2021-12-22 23:04:22 +01:00
Miriam Baglioni b81efb6a9d [FOS]changed the mapping between the csv and the model. Changed Test classes and resources 2021-12-22 21:40:35 +01:00
Miriam Baglioni de6c4c8968 [FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi 2021-12-22 16:44:44 +01:00
Miriam Baglioni 34ac56565d refactoring 2021-12-22 16:28:11 +01:00
Miriam Baglioni 20ef1d657f refactoring 2021-12-22 16:26:36 +01:00
Miriam Baglioni 813f856d3f [BipFinder] removing left over parameter in wf 2021-12-22 16:11:12 +01:00
Miriam Baglioni 2c126ed014 [BipFinder] create unresolved entities with measures at the level of the instance 2021-12-22 16:03:41 +01:00
Miriam Baglioni 0807fdb65a [BipFinder] remove not needed resources 2021-12-22 15:37:00 +01:00
Miriam Baglioni b5e11a3a0a [BipFinder] put in common package BipFinder model 2021-12-22 15:33:05 +01:00
Miriam Baglioni c5739c4266 [BipFinder] create action set for the measures at the level of the result 2021-12-22 15:08:33 +01:00
Miriam Baglioni e24a7f3496 mergin with branch beta 2021-12-21 13:57:19 +01:00
Miriam Baglioni d1ae219cb4 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-21 13:55:53 +01:00
Sandro La Bruzzo 3920d68992 Fixed workflow generation of delta in datacite 2021-12-21 11:41:49 +01:00
Sandro La Bruzzo b881ee5ef8 [scholexplorer]
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Miriam Baglioni 22d4b5619b [BipFinder Result] last changes to test and resources files 2021-12-14 14:54:13 +01:00
Miriam Baglioni 6fb6236cd4 changed the way to produce the AS for bipFinder. 2021-12-14 14:51:14 +01:00
Miriam Baglioni 4eb8276493 - 2021-12-14 11:12:17 +01:00
Miriam Baglioni b113586207 resolved conflicts 2021-12-07 10:16:14 +01:00
Miriam Baglioni d9836f0cf3 [OpenCitations] fixed test when executed one after the other 2021-12-06 15:27:09 +01:00
Sandro La Bruzzo 7af0bbd0b1 [scala-refactor] Module dhp-aggregation:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Miriam Baglioni f430688ff7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-03 12:36:08 +01:00
Sandro La Bruzzo f7011b90d8 format code 2021-12-03 11:15:09 +01:00
Miriam Baglioni 87eedad898 - 2021-12-02 13:17:19 +01:00
Sandro La Bruzzo feea154e89 remove working dir after test 2021-11-25 11:02:38 +01:00
Sandro La Bruzzo 028a8acad8 add test resources 2021-11-25 10:54:47 +01:00
Sandro La Bruzzo 2164a2a889 Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Sandro La Bruzzo a7cf277d98 Datacite: Removed HostedBy Patch as described on ticket #7219, Now all the records will have hosted by Unknown Repository 2021-11-22 16:03:17 +01:00
Sandro La Bruzzo fc03c99805 fixed javadocs url after deploying site 2021-11-19 10:46:33 +01:00
Claudio Atzori bd9a43cefd Revert to 4094f2bb9a 2021-11-19 09:20:43 +01:00
Claudio Atzori 3a4d925386 Merge branch 'beta' into hierarchical_orgs_relations 2021-11-18 18:07:08 +01:00
Sandro La Bruzzo 2506d7a679 Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation 2021-11-17 11:07:24 +01:00
Sandro La Bruzzo cded363b55 code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala 2021-11-17 11:06:35 +01:00
Miriam Baglioni 4094f2bb9a added integration md file 2021-11-17 10:04:52 +01:00
Miriam Baglioni ec8b0219ff [Documentation] Added first page for Integration via unresolved entities generation 2021-11-16 17:41:34 +01:00
Sandro La Bruzzo 2d67020c59 added dhp-enrichment maven site template 2021-11-16 16:01:08 +01:00
Claudio Atzori bafa2990f3 code formatting 2021-11-15 17:07:16 +01:00
Sandro La Bruzzo efa09057db Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-11-15 14:32:09 +01:00
Sandro La Bruzzo 48923e46a1 added documentation to Pubmed Class and also added mvn site for dhp-aggregations 2021-11-15 14:32:01 +01:00
Miriam Baglioni 4ec88c718c merge with beta - resolved conflict in pom 2021-11-15 10:52:16 +01:00
Miriam Baglioni 6f1a434e90 [Bypass Action Set] Fixed test to consider the new identifier utils 2021-11-15 09:59:23 +01:00
Miriam Baglioni 157d33ebf9 [Bypass Action Set] Refactoring 2021-11-15 09:58:48 +01:00
Miriam Baglioni 92d0e18b55 [Bypass Action Set] used constant DOI instead of "doi" 2021-11-12 10:56:58 +01:00
Miriam Baglioni 881113743f [Bypass Action Set] refactoring 2021-11-12 10:55:50 +01:00
Miriam Baglioni 47ccb53c4f [Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment) 2021-11-12 10:54:09 +01:00