Commit Graph

2406 Commits

Author SHA1 Message Date
Michele Artini 83132ee99a fixed a problem with empty mdstore list 2021-06-14 11:57:00 +02:00
Miriam Baglioni cf360d7c97 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-14 10:19:49 +02:00
Miriam Baglioni 8873e6b6d1 workflow and parameter 2021-06-14 10:15:57 +02:00
Miriam Baglioni 0f1acdf6b6 workflow and parameter 2021-06-14 10:08:55 +02:00
Sandro La Bruzzo aeb8132627 Merged branch stable_ids 2021-06-14 10:07:29 +02:00
Sandro La Bruzzo efbea1e01a minor fix 2021-06-14 09:45:14 +02:00
Miriam Baglioni 75780fc636 extraction of the tar for the dump of crossref, and creation of the dataset 2021-06-14 09:45:07 +02:00
Claudio Atzori 2039bb9f5f orcid / orcid_pending cleaning backported from master branch 2021-06-14 09:40:50 +02:00
Claudio Atzori dd19c4ac5a Merge pull request 'import_new_mdstores' (#112) from import_new_mdstores into stable_ids
Reviewed-on: D-Net/dnet-hadoop#112
2021-06-14 09:23:55 +02:00
Claudio Atzori e9e86a237d Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-11 17:00:02 +02:00
Claudio Atzori a900bfb874 delegating the date parsing to https://github.com/sisyphsu/dateparser 2021-06-11 16:53:01 +02:00
Sandro La Bruzzo dd997c49e0 fix wrong relation id
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis d413b24611 added instances, orgs for monitor, totalcost for projects, apcs 2021-06-10 02:35:46 +03:00
Claudio Atzori 741077dbca Merge pull request 'Fix in Affiliation Propagation' (#113) from miriam.baglioni/dnet-hadoop:master into stable_ids
Reviewed-on: D-Net/dnet-hadoop#113
2021-06-09 18:42:42 +02:00
Miriam Baglioni 32b0c27217 Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo 0d1f37302f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-09 09:35:16 +02:00
Miriam Baglioni dc07f1079b added check in case the author set to be enriched is null 2021-06-08 12:06:10 +02:00
Miriam Baglioni 8d2e086e48 changes to avoid reassignment to val 2021-06-07 17:50:37 +02:00
Miriam Baglioni f33521d338 Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni bc12e9819e Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo 0cdb7ccdaa added inverse relations to datacite mapping 2021-06-04 15:10:20 +02:00
Sandro La Bruzzo 5b724d9972 added relations to datacite mapping 2021-06-04 10:14:22 +02:00
Sandro La Bruzzo e57294ac99 implemented changes on PUBMed dataflow 2021-06-03 10:52:09 +02:00
Michele Artini ede2749822 orcid pid type 2021-06-01 12:42:43 +02:00
Michele Artini f0fbfdcfae Merge branch 'stable_ids' into import_new_mdstores 2021-06-01 12:03:00 +02:00
Michele Artini e950750262 add nodes to import hdfs mdstores 2021-06-01 10:48:50 +02:00
Michele Artini 03a510859a removed coalesce(1) 2021-05-31 14:10:51 +02:00
Michele Artini e9f2b6037c patch of mdstore records 2021-05-31 11:36:26 +02:00
Sandro La Bruzzo 02ef46535f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-31 09:50:15 +02:00
Sandro La Bruzzo aeadc5a366 updated wf Datacite Import to retrieve the block size as parameter 2021-05-31 09:49:53 +02:00
Claudio Atzori 96238152cb added serialization for alternateIdentifiers and pids within each record instance 2021-05-28 16:57:30 +02:00
Michele Artini ad56a44fda save as gzipped sequence file 2021-05-28 14:45:39 +02:00
Claudio Atzori 83722ebc47 pull #111 replied on stable_ids 2021-05-28 14:11:46 +02:00
Claudio Atzori 6e3a4e9237 updated test expectations 2021-05-28 09:37:50 +02:00
Michele Artini 4fa5671d16 first implementation of Hdfs Mdstores Importer 2021-05-27 16:22:07 +02:00
Claudio Atzori d512062b58 integrating pull #109, H2020Classification 2021-05-27 12:22:47 +02:00
Claudio Atzori 5e4b91d9ef more pervasive use of constants from ModelConstants, especially for ORCID 2021-05-26 18:20:23 +02:00
Sandro La Bruzzo bced804151 updated wf Datacite Import to retrieve the block size as parameter 2021-05-26 17:06:50 +02:00
Miriam Baglioni abd88f663d changed test resource to mirror change in the input file 2021-05-21 15:20:47 +02:00
Miriam Baglioni c844877de2 changed workflow flow to possibly parallelize also the programme and project preparation steps 2021-05-21 14:41:57 +02:00
Miriam Baglioni 073d76864d refactoring 2021-05-21 14:41:03 +02:00
Miriam Baglioni 4c8b4a774c removed not needed code 2021-05-21 14:40:07 +02:00
Miriam Baglioni 53b9d87fec new prepareProgramme according to the new file 2021-05-21 11:49:31 +02:00
Miriam Baglioni 1ee8f13580 refactoring and added "left" as join type to be 100% sure to get the whole set of projects 2021-05-21 11:49:05 +02:00
Miriam Baglioni e07c3ba089 due to change in the input file the filtering step is no more needed 2021-05-21 11:47:43 +02:00
Miriam Baglioni 54f6e2f693 changed to get the needed information to build the action set as parallel jobs 2021-05-21 11:47:00 +02:00
Miriam Baglioni 7180505519 removed non needed variable 2021-05-21 11:46:13 +02:00
Miriam Baglioni 2eb1a8b344 changed because the input file changed 2021-05-21 11:40:20 +02:00
Claudio Atzori 9d725efdc1 reverted implementation of the mdstore client 2021-05-20 18:26:09 +02:00
Miriam Baglioni 9610224671 added param to workflow property 2021-05-20 18:21:12 +02:00
Claudio Atzori 863b56b6ce using constants from ModelConstants 2021-05-20 16:23:58 +02:00
Claudio Atzori ae5c28e54f code formatting 2021-05-20 16:13:06 +02:00
Miriam Baglioni aa45b4df9b - 2021-05-20 15:57:40 +02:00
Miriam Baglioni 052c837843 - 2021-05-20 15:54:44 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Claudio Atzori b572f56763 Merge branch 'master' into master 2021-05-20 15:22:35 +02:00
Claudio Atzori 2578b7fbb3 code formatting 2021-05-20 14:59:02 +02:00
Miriam Baglioni dc0ad8d2e0 fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change 2021-05-20 14:53:53 +02:00
Claudio Atzori 232dce83db fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping 2021-05-20 14:41:15 +02:00
Claudio Atzori aef2977ad0 fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping 2021-05-20 14:40:22 +02:00
Miriam Baglioni 02b80cf24f resolved conflicts 2021-05-20 10:59:39 +02:00
Claudio Atzori c4a23c2f4d fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds 2021-05-19 16:01:52 +02:00
Claudio Atzori ba03f549d7 fix: preserving the old identifier among the originalIds in the doiboost construction process 2021-05-19 15:43:26 +02:00
Claudio Atzori 239d0f0a9a ROR actionset import workflow backported from branch stable_ids 2021-05-18 16:12:11 +02:00
Antonis Lempesis 168edcbde3 added the final steps for the observatory promote wf and some cleanup 2021-05-18 15:23:20 +03:00
Michele Artini e56ccec536 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-18 14:00:28 +02:00
Michele Artini c1e20de7cf fixed the deserialization of a json property 2021-05-18 14:00:14 +02:00
Claudio Atzori a9f512103b using constants from ModelConstants 2021-05-18 11:19:07 +02:00
Claudio Atzori eeb8bcf075 using constants from ModelConstants 2021-05-18 11:10:07 +02:00
Claudio Atzori 2cbf15f4fb using ModelConstants 2021-05-17 09:54:45 +02:00
Claudio Atzori f19feceaf0 set the old identifier before switching to the new one 2021-05-14 12:53:40 +02:00
Claudio Atzori 1bd70fa2c6 preserving the old identifier among the originalIds in the doiboost construction process 2021-05-14 11:30:41 +02:00
Claudio Atzori ca3f3a7687 using ModelConstants 2021-05-14 11:29:49 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 609eb711b3 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:13:28 +02:00
Claudio Atzori 1517bf7c92 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:11:22 +02:00
Sandro La Bruzzo d9a0bbda7b implemented new phase in doiboost to make the dataset Distinct by ID 2021-05-13 12:25:14 +02:00
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori d4c3476152 mapping datasource.journal only when an issn is available, null otherwhise 2021-05-11 11:08:54 +02:00
Claudio Atzori da9d6f3887 mapping datasource.journal only when an issn is available, null otherwhise 2021-05-11 10:45:30 +02:00
Sandro La Bruzzo 54217d73ff removed old parameters from oozie workflow 2021-05-11 09:59:02 +02:00
Claudio Atzori d1cbee8413 imported methods from CleaningFunctions, defined in GraphCleaningFunctions 2021-05-10 16:43:39 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Claudio Atzori 25254885b9 [ActionManagement] reduced number of xqueries used to access ActionSet info 2021-05-07 17:32:03 +02:00
Claudio Atzori 8a0de2fc18 [ActionManagement] reduced number of xqueries used to access ActionSet info 2021-05-07 17:31:32 +02:00
Sandro La Bruzzo 7dc824fc23 imported changes in stable_id into master 2021-05-07 12:53:50 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori d4a30fabe3 clean up tests 2021-05-05 17:28:15 +02:00
Claudio Atzori dccaf173cf fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials 2021-05-05 16:36:15 +02:00
Claudio Atzori 8c96a82a03 fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials 2021-05-05 15:30:06 +02:00
Claudio Atzori 2e1eb96f9a code formatting 2021-05-05 11:23:57 +02:00
Sandro La Bruzzo 1adfc41d23 merged manually changes on stable_id for doiboost into master 2021-05-05 10:23:32 +02:00
Claudio Atzori fb930b84d3 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-05-04 18:06:30 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Sandro La Bruzzo 714b71bd21 updated pubmed 2021-05-04 14:54:12 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini f4bd2b5619 recert file SparkDedupTest.java 2021-05-04 10:26:14 +02:00
Michele Artini b4877da363 Merge branch 'stable_ids' into prepare_ror_actionset 2021-05-03 08:13:55 +02:00
Alessia Bardi 9a20057615 fixed query for organisations' pids 2021-04-29 15:23:39 +02:00