Compare commits

...

153 Commits

Author SHA1 Message Date
Claudio Atzori 242d647146 cleanup & docs 2023-10-12 12:23:44 +02:00
Claudio Atzori af3ffad6c4 [AMF] docs 2023-10-12 10:07:52 +02:00
Claudio Atzori ba5475ed4c Merge pull request 'Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0 (zero) character' (#345) from fix_truncated_pmid into master
Reviewed-on: D-Net/dnet-hadoop#345
2023-10-06 14:19:49 +02:00
Giambattista Bloisi 2c235e82ad Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character 2023-10-06 12:35:54 +02:00
Claudio Atzori 4ac06c9e37 Merge pull request 'Fix bug in conversion from dedup json model to Spark Dataset of Rows (instanceTypeMatch no longer working)' (#339) from fix_dedupfailsonmatchinginstances into master
Reviewed-on: D-Net/dnet-hadoop#339
2023-10-02 11:34:20 +02:00
Claudio Atzori fa692b3629 Merge branch 'master' into fix_dedupfailsonmatchinginstances 2023-10-02 11:28:16 +02:00
Claudio Atzori ef02648399 Merge pull request 'fixed dedup configuration management in the Broker workflow' (#341) from fix_8997 into master
Reviewed-on: D-Net/dnet-hadoop#341
2023-10-02 11:03:50 +02:00
Claudio Atzori d13bb534f0 Merge branch 'master' into fix_8997 2023-10-02 11:03:18 +02:00
Giambattista Bloisi 775c3f704a Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes 2023-09-27 22:30:47 +02:00
Sandro La Bruzzo 9c3ab11d5b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2023-09-25 15:29:19 +02:00
Sandro La Bruzzo 423ef30676 minor fix on the aggregation of uniprot and pdb 2023-09-25 15:28:58 +02:00
Giambattista Bloisi 7152d47f84 Use asScala to convert java List to Scala Sequence 2023-09-20 16:14:27 +02:00
Claudio Atzori 4853c19b5e code formatting 2023-09-20 15:53:21 +02:00
Giambattista Bloisi 1f226d1dce Fix defect #8997: GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed 2023-09-20 15:42:00 +02:00
Alessia Bardi 6186cdc2cc Use v5 of the UNIBI Gold ISSN list in test 2023-09-19 14:47:01 +02:00
Alessia Bardi d94b9bebf7 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2023-09-19 13:38:45 +02:00
Alessia Bardi 19abba8fa7 tests for d4science catalog 2023-09-19 13:38:25 +02:00
Claudio Atzori c2f179800c Merge pull request 'Run CC and RAM sequentieally in dhp-impact-indicators WF' (#338) from run_cc_and_ram_sequentially into master
Reviewed-on: D-Net/dnet-hadoop#338
2023-09-13 08:52:53 +02:00
Serafeim Chatzopoulos 2aed5a74be Run CC and RAM sequentieally in dhp-impact-indicators WF 2023-09-12 22:31:50 +03:00
Claudio Atzori 4dc4862011 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2023-09-12 14:34:34 +02:00
Claudio Atzori dc80ab14d3 [graph dedup] consistency wf should not remove the relations while dispatching the entities 2023-09-12 14:34:28 +02:00
Alessia Bardi 77a2199837 updated test for EOSC comunity 2023-09-08 11:05:49 +02:00
Claudio Atzori 265180bfd2 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:20:35 +02:00
Claudio Atzori da0e9828f7 resolved conflicts for PR#337 2023-09-06 11:28:46 +02:00
Claudio Atzori 0bc74e2000 code formatting 2023-08-02 11:52:10 +02:00
Claudio Atzori 7180911ded [graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests 2023-08-02 11:44:14 +02:00
Claudio Atzori da1727f93f rule out records with NULL dataInfo, except for Relations 2023-07-31 17:52:56 +02:00
Claudio Atzori ccac6a7f75 rule out records with NULL dataInfo 2023-07-31 12:35:05 +02:00
Claudio Atzori d512df8612 code formatting 2023-07-26 09:14:08 +02:00
Claudio Atzori 59764145bb cherry picked & fixed commit 270df939c4 2023-07-25 17:39:00 +02:00
Claudio Atzori 373a5f2c83 Merge pull request 'Master branch updates from beta July 2023' (#317) from master_july23 into master
Reviewed-on: D-Net/dnet-hadoop#317
2023-07-18 18:22:04 +02:00
Claudio Atzori 8af129b0c7 merged stats promotion step from antonis/promotion-prod-only 2023-07-13 15:03:28 +02:00
dimitrispie 706092bc19 Update updateProductionViews.sh 2023-07-13 15:48:12 +03:00
dimitrispie aedd279f78 Updates Promotion DBs
- Add a step for promoting the splitted monitor DBs
2023-07-13 15:35:46 +03:00
Miriam Baglioni 8dcd028eed [UsageCount] fixed typo in attribute name for datasource table 2023-07-01 16:07:22 +02:00
Claudio Atzori f3a85e224b merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext 2023-06-28 13:33:57 +02:00
Claudio Atzori 4ef0f2ec26 added dependency commons-validator:commons-validator:1.7 2023-06-28 13:32:01 +02:00
Claudio Atzori 288ec0b7d6 [doiboost] merged workflow from branch beta 2023-06-28 09:15:37 +02:00
Claudio Atzori 5f32edd9bf adopting dhp-schema:3.17.1 2023-06-27 16:57:17 +02:00
Claudio Atzori e10ce92fe5 [stats wf] merged workflows from branch beta 2023-06-27 14:32:48 +02:00
Claudio Atzori b93e1541aa Merge pull request 'update sql query to return distinct pids' (#301) from distinct_pids_from_openorgs into master
Reviewed-on: D-Net/dnet-hadoop#301
2023-06-27 12:24:47 +02:00
Claudio Atzori d029bf0b94 Merge branch 'master' into distinct_pids_from_openorgs 2023-06-27 12:24:35 +02:00
Michele Artini 009d7f312f fixed a datasource Id 2023-06-21 16:17:34 +02:00
Giambattista Bloisi 758e662ab8 Revert "REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method"
This reverts commit 485f9d18cb.
2023-06-19 13:08:10 +02:00
Giambattista Bloisi 485f9d18cb REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method 2023-06-19 13:00:02 +02:00
Michele Artini a92206dab5 re-added the name of a column (pid) 2023-06-13 11:43:10 +02:00
Alessia Bardi 118e72d7db Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:39:12 +02:00
Alessia Bardi 5befd93d7d test records for Solr indexing 2023-06-06 14:34:33 +02:00
Michele Artini cae92cf811 update sql query to return distinct pids 2023-06-06 14:06:06 +02:00
Claudio Atzori 654ffcba60 Merge pull request '[UsageCount] addition of usagecount for Projects and datasources' (#296) from master_datasource_project_usagecounts into master
Reviewed-on: D-Net/dnet-hadoop#296
2023-05-22 16:13:24 +02:00
Claudio Atzori db625e548d [UsageCount] addition of usagecount for Projects and datasources 2023-05-22 15:00:46 +02:00
Alessia Bardi 04141fe259 tests for records from D4Science catalogues 2023-05-19 14:28:24 +02:00
Alessia Bardi b88f009d9f combined level 4 and 6 for the demo 2023-04-24 12:10:33 +02:00
Alessia Bardi 5ffe82ffd8 aligned to current DMF index layout on production 2023-04-24 12:09:55 +02:00
Alessia Bardi 1c173642f0 removed level5 from test records 2023-04-24 09:32:32 +02:00
Alessia Bardi 382f46a8e4 tests to generate the XML records for the index for the EDITH demo on digital twins, integrating output from the FoS classifier 2023-04-21 16:46:30 +02:00
Miriam Baglioni 24c41806ac [ZenodoApiClienttest] change test to mirror change in the omplementation 2023-04-18 09:08:09 +02:00
Miriam Baglioni 087b5a7973 [ZenodiAPIClient] new version of the API to connect to Zenodo (change the http client 2023-04-17 18:59:22 +02:00
Claudio Atzori 688e3b7936 added eoscifguidelines in the result view; removed compute statistics statements 2023-04-11 11:45:56 +02:00
Claudio Atzori 2e465915b4 [graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml 2023-04-11 10:43:44 +02:00
Claudio Atzori 4a4ca634f0 Merge pull request 'advConstraintsInBeta' (#288) from advConstraintsInBeta into master
Reviewed-on: D-Net/dnet-hadoop#288
2023-04-06 15:24:23 +02:00
Miriam Baglioni c6a7602b3e refactoring after compilation 2023-04-06 14:45:01 +02:00
Miriam Baglioni 831055a1fc change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints 2023-04-06 14:41:32 +02:00
Miriam Baglioni cf3d0f4f83 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:17:35 +02:00
Claudio Atzori 4f67225fbc Merge pull request 'doiboostMappingExtention' (#286) from doiboostMappingExtention into master
Reviewed-on: D-Net/dnet-hadoop#286
2023-04-06 09:25:08 +02:00
Claudio Atzori e093f04874 Merge pull request 'AdvancedConstraint' (#285) from advConstraintsInBeta into master
Reviewed-on: D-Net/dnet-hadoop#285
2023-04-06 09:24:54 +02:00
Miriam Baglioni c5a9f39141 Extended the association project - result in the mapping from CrossRef 2023-04-05 16:48:36 +02:00
Miriam Baglioni ecc05fe0f3 Added the code for the advancedConstraint implementation during the bulkTagging 2023-04-05 16:40:29 +02:00
Claudio Atzori 42442ccd39 Merge pull request 'updated the order of the compatibilities' (#275) from compatibility_order into master
Reviewed-on: D-Net/dnet-hadoop#275
2023-04-05 12:44:14 +02:00
Miriam Baglioni 9a9cc6a1dd changed the way the tar archive is build to support renaming in case we need to change .tt.gz into .json.gz 2023-04-04 11:40:58 +02:00
Michele Artini 200098b683 updated the order of the compatibilities 2023-02-22 11:52:59 +01:00
Michele Artini 9c1df15071 null values in date range conditions 2023-02-13 16:05:58 +01:00
Miriam Baglioni 32870339f5 refactoring after compile 2023-02-13 13:06:48 +01:00
Miriam Baglioni 7184cc0804 [FoS] added check for null on level1 subject 2023-02-13 13:03:49 +01:00
Miriam Baglioni 7473093c84 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-10 15:34:52 +01:00
Miriam Baglioni 5f0906be60 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2023-02-02 17:13:14 +01:00
Claudio Atzori 1b37516578 [bulk tagging] better node naming 2023-01-20 16:11:26 +01:00
Claudio Atzori c1e2460293 [cleaning] the datasource master-duplicate fixup should not be brought to production yet 2023-01-20 09:20:26 +01:00
Claudio Atzori 3800361033 [country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type 2023-01-19 15:57:43 +01:00
Michele Artini 699736addc NPE prevention 2023-01-11 13:14:44 +01:00
Claudio Atzori f86e19b282 code formatting 2023-01-11 09:53:19 +01:00
Michele Artini d40e20f437 Considering instance pids and alteternative identifiers 2023-01-11 09:37:34 +01:00
Michele Artini 4953ae5649 fixed an invalid char 2023-01-11 08:35:53 +01:00
Miriam Baglioni c60d3a2b46 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2023-01-09 17:28:27 +01:00
Claudio Atzori 7becdaf31d Merge pull request 'Workaround to use new version of intellij on Master' (#266) from master_intellij into master
Reviewed-on: D-Net/dnet-hadoop#266
2022-12-23 10:32:21 +01:00
Miriam Baglioni b713132db7 [Cleaning] adding missing classes 2022-12-21 12:49:08 +01:00
Miriam Baglioni 11f2b470d3 [Cleaning] adding missing classes 2022-12-21 12:42:19 +01:00
Sandro La Bruzzo 91c70b15a5 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:14:42 +01:00
Claudio Atzori f910b7379d [cleaning] recovering missing resources from D-Net/dnet-hadoop#265 2022-12-21 09:26:34 +01:00
Claudio Atzori 33bdad104e [cleaning] align parameter names 2022-12-20 21:43:59 +01:00
Claudio Atzori 5816ded93f code formatting 2022-12-20 10:41:40 +01:00
Claudio Atzori 46972f8393 [orcid propagation] skip empty directory 2022-12-20 10:28:22 +01:00
Claudio Atzori da85ca697d Merge pull request 'cleanCountryOnMaster' (#265) from cleanCountryOnMaster into master
Reviewed-on: D-Net/dnet-hadoop#265
2022-12-16 15:58:44 +01:00
Miriam Baglioni 059e100ec7 [Clean Country] moving other resources for testing purposes 2022-12-16 15:48:21 +01:00
Miriam Baglioni fc95a550c3 [Clean Country] moving other resources for testing purposes 2022-12-16 15:46:32 +01:00
Miriam Baglioni 6901ac91b1 [Clean Country] moving source and resources to master 2022-12-16 15:42:49 +01:00
Claudio Atzori 08c4588d47 Merge pull request 'Changes from beta stats wf to prod' (#264) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: D-Net/dnet-hadoop#264
2022-12-07 15:56:22 +01:00
Miriam Baglioni 29d3da85f1 [EOSC DUMP] added resources needed for the review as test 2022-11-25 17:16:20 +01:00
Miriam Baglioni 33a2b1b5dc [Bulk Tag] fixed typo in test configuration 2022-11-23 11:31:17 +01:00
Miriam Baglioni c6df8327b3 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-11-23 11:26:57 +01:00
Miriam Baglioni 935aa367d8 [BulkTag] removed commented code 2022-11-23 11:16:39 +01:00
Miriam Baglioni 43aedbdfe5 [BulkTag] changed verb name in configuration 2022-11-23 11:14:23 +01:00
Miriam Baglioni b6da9b67ff [BulkTag] fixed typo in annotation for verb name 2022-11-23 11:13:58 +01:00
Claudio Atzori a34c8b6f81 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-11-22 10:22:31 +01:00
Miriam Baglioni 122e75aa17 fixed conflicts 2022-11-21 18:13:12 +01:00
Miriam Baglioni cee7a45b1d [Bulk Tag Datasource] fixed issue with verb name and add new test for neanias selection for orcid 2022-11-21 18:10:20 +01:00
Claudio Atzori ed64618235 increased spark.sql.shuffle.partitions in the last join phase of the result (publication) to community through semantic relation propagation 2022-11-18 16:06:51 +01:00
Claudio Atzori 8742934843 added spark.sql.shuffle.partitions in the last join phase of the result to community through semantic relation propagation 2022-11-18 11:32:22 +01:00
Claudio Atzori 13cc592f39 code formatting 2022-11-15 09:37:57 +01:00
Claudio Atzori af15b1e48d [eosc tag] extending criteria for Jupyter Notebook (adding to ORP the same constraint) 2022-11-14 18:30:43 +01:00
Claudio Atzori eb45ba7af0 extended mapping from ODF relations (PR#251) 2022-11-14 18:26:13 +01:00
Claudio Atzori a929dc5fee integrated changes for mapping ROHub contents in the Graph 2022-11-14 18:15:35 +01:00
Miriam Baglioni 5f9383b2d9 [EOSC TAG] remove reduntant check for jupyter notebook 2022-11-11 14:06:19 +01:00
Miriam Baglioni b18bbca8af [EOSC TAG] adding search in orp for jupyter notebook criteria 2022-11-11 12:42:58 +01:00
dimitrispie 55fa3b2a17 Hive memory parameters 2022-11-03 15:21:04 +01:00
Claudio Atzori 80c5e0f637 code formatting 2022-09-27 12:51:51 +02:00
Claudio Atzori c01d528ab2 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:19:50 +02:00
Claudio Atzori e6d788d27a [stats wf] adding missing changes lost in PR#248 2022-09-23 14:38:42 +02:00
Claudio Atzori 930f118673 fixed semantic (subreltype) for ServiceOrganization relations 2022-09-22 16:24:44 +02:00
Claudio Atzori b2c3071e72 Merge branch 'master' into beta2master_sept_2022 2022-09-22 14:39:15 +02:00
Claudio Atzori 10ec074f79 Merge remote-tracking branch 'antonis.lempesis/beta' into beta2master_sept_2022 2022-09-22 14:12:19 +02:00
Claudio Atzori 7225fe9cbe integrated changes from discard-non-wellformed 2022-09-22 10:06:07 +02:00
Miriam Baglioni 869e129288 [EOSC BulkTag] refactoring 2022-09-20 16:13:18 +02:00
Miriam Baglioni 840465958b [EOSC BulkTag] filtering aout the datasources registered in the eosc with compatibility different from 3.0, 4.0 for literature, data and CRIS to add the context eosc to the results 2022-09-20 10:30:41 +02:00
Claudio Atzori bdc8f993d0 [Patch Hosted By] check also the presence of datasource.officialname.value 2022-09-19 15:28:03 +02:00
Miriam Baglioni ec87149cb3 [Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided. Removing datasources if no officialname has been provided 2022-09-19 14:06:52 +02:00
Miriam Baglioni b42e2c9df6 [Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided 2022-09-19 12:30:32 +02:00
Miriam Baglioni 1329aa8479 [EOSC BulkTag] modified test to remove association of result to eosc when eoscifguidelines are set 2022-09-19 11:59:48 +02:00
Miriam Baglioni a0ee1a8640 [EOSC BulkTag] remove addition of eosc context for result with eosc if guidelines set 2022-09-19 11:44:10 +02:00
Claudio Atzori 96062164f9 Merge pull request '[Aggregator graph|master] Discard invalid records' (#245) from discard-non-wellformed into master
Reviewed-on: D-Net/dnet-hadoop#245
2022-09-19 09:48:16 +02:00
Claudio Atzori 35bb7c423f updated dhp-schemas version to 2.12.1 2022-09-16 16:13:15 +02:00
Claudio Atzori fd87571506 code formatting 2022-09-16 16:05:03 +02:00
Claudio Atzori c527112e33 Merge commit 'ff6f789b6d9be0567b6ad72f8a0e75fe3f52726a' into beta2master_sept_2022 2022-09-16 15:59:10 +02:00
Claudio Atzori 65209359bc Merge commit 'b5f7bd30be7f7adaaa28170740da0484b50a77ed' into beta2master_sept_2022 2022-09-16 15:58:11 +02:00
Claudio Atzori d72a64ded3 Merge commit '690be4482fc84327dc7617acbc8d976d559df512' into beta2master_sept_2022 2022-09-16 15:57:44 +02:00
Claudio Atzori 3e8499ce47 Merge commit '71b069ca90a2f7ec09d64241c60917d3636fc81e' into beta2master_sept_2022 2022-09-16 15:57:20 +02:00
Claudio Atzori 61aacb3271 Merge commit '1203378441dc6d8e8435cacd42e76e11746f6d1b' into beta2master_sept_2022 2022-09-16 15:56:55 +02:00
Claudio Atzori dbb567251a merged 853c996fa2 2022-09-16 15:56:28 +02:00
Claudio Atzori c7e8ad853e Merge commit '2b5f8c9c9a3611c57ee5febfe262a455a39ad801' into beta2master_sept_2022 2022-09-16 15:55:04 +02:00
Claudio Atzori 0849ebfd80 merged a11eb38065 2022-09-16 15:54:32 +02:00
Claudio Atzori 281239249e Merge commit 'b7c387c21f946adbc9da90ded95166205195edb0' into beta2master_sept_2022 2022-09-16 15:49:20 +02:00
Claudio Atzori 45fc5e12be Merge commit 'cb7c07c54e59675e8dffe42b7f2a13f16c956068' into beta2master_sept_2022 2022-09-16 15:48:55 +02:00
Claudio Atzori 1c05aaaa2e Merge commit '3418ce50ac9b28fed4fa949919e6c8208738cdcf' into beta2master_sept_2022 2022-09-16 15:48:36 +02:00
Claudio Atzori 01d5ad6361 Merge commit 'd85ba3c1a9d7f0e80565742161ff6c9ecffd52b7' into beta2master_sept_2022 2022-09-16 15:48:16 +02:00
Claudio Atzori d872d1cdd9 Merge commit 'a4815f6bec87f05be8cd740d236707949a0f746e' into beta2master_sept_2022 2022-09-16 15:47:49 +02:00
Claudio Atzori ab0efecab4 Merge commit '84598c75356cf580de6c81653a9351e9b8173639' into beta2master_sept_2022 2022-09-16 15:47:05 +02:00
Claudio Atzori 725c3c68d0 Merge commit '844f6eb46533cdd4be3210401b10401322079640' into beta2master_sept_2022 2022-09-16 15:46:40 +02:00
Claudio Atzori 300ae6221c Merge commit '32cee1f619eb30d2e2ac6083435b76b1aba7db09' into beta2master_sept_2022 2022-09-16 15:45:57 +02:00
Claudio Atzori 0ec2eaba35 Merge commit 'c1f2ffc53dc41f1fac3855b2d2df7d6a5ea15e3e' into beta2master_sept_2022 2022-09-16 15:45:27 +02:00
Claudio Atzori a387807d43 Merge commit 'b78889a0ce27a79c7ab2d8da05b118ee4f1bcb36' into beta2master_sept_2022 2022-09-16 15:44:17 +02:00
Claudio Atzori 2abe2bc137 Merge commit '08ce2cadc2d84aa982726e429c280a905536a715' into beta2master_sept_2022 2022-09-16 15:43:49 +02:00
Claudio Atzori a07c876922 Merge commit '27a91841e7fa2a1b615b4d1e161d606db5bead96' into beta2master_sept_2022 2022-09-16 15:43:02 +02:00
Claudio Atzori cbd48bc645 Merge commit 'efd96e7e664e4139321e35e8d172b884ba4b61a1' into beta2master_sept_2022 2022-09-16 15:38:56 +02:00
36 changed files with 3365 additions and 261 deletions

1
.gitignore vendored
View File

@ -26,3 +26,4 @@ spark-warehouse
/**/*.log
/**/.factorypath
/**/.scalafmt.conf
/.java-version

128
README.md
View File

@ -1,2 +1,128 @@
# dnet-hadoop
Dnet-hadoop is the project that defined all the OOZIE workflows for the OpenAIRE Graph construction, processing, provisioning.
Dnet-hadoop is the project that defined all the [OOZIE workflows](https://oozie.apache.org/) for the OpenAIRE Graph construction, processing, provisioning.
How to build, package and run oozie workflows
====================
Oozie-installer is a utility allowing building, uploading and running oozie workflows. In practice, it creates a `*.tar.gz`
package that contains resources that define a workflow and some helper scripts.
This module is automatically executed when running:
`mvn package -Poozie-package -Dworkflow.source.dir=classpath/to/parent/directory/of/oozie_app`
on module having set:
```
<parent>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-workflows</artifactId>
</parent>
```
in `pom.xml` file. `oozie-package` profile initializes oozie workflow packaging, `workflow.source.dir` property points to
a workflow (notice: this is not a relative path but a classpath to directory usually holding `oozie_app` subdirectory).
The outcome of this packaging is `oozie-package.tar.gz` file containing inside all the resources required to run Oozie workflow:
- jar packages
- workflow definitions
- job properties
- maintenance scripts
Required properties
====================
In order to include proper workflow within package, `workflow.source.dir` property has to be set. It could be provided
by setting `-Dworkflow.source.dir=some/job/dir` maven parameter.
In oder to define full set of cluster environment properties one should create `~/.dhp/application.properties` file with
the following properties:
- `dhp.hadoop.frontend.user.name` - your user name on hadoop cluster and frontend machine
- `dhp.hadoop.frontend.host.name` - frontend host name
- `dhp.hadoop.frontend.temp.dir` - frontend directory for temporary files
- `dhp.hadoop.frontend.port.ssh` - frontend machine ssh port
- `oozieServiceLoc` - oozie service location required by run_workflow.sh script executing oozie job
- `nameNode` - name node address
- `jobTracker` - job tracker address
- `oozie.execution.log.file.location` - location of file that will be created when executing oozie job, it contains output
produced by `run_workflow.sh` script (needed to obtain oozie job id)
- `maven.executable` - mvn command location, requires parameterization due to a different setup of CI cluster
- `sparkDriverMemory` - amount of memory assigned to spark jobs driver
- `sparkExecutorMemory` - amount of memory assigned to spark jobs executors
- `sparkExecutorCores` - number of cores assigned to spark jobs executors
All values will be overriden with the ones from `job.properties` and eventually `job-override.properties` stored in module's
main folder.
When overriding properties from `job.properties`, `job-override.properties` file can be created in main module directory
(the one containing `pom.xml` file) and define all new properties which will override existing properties.
One can provide those properties one by one as command line `-D` arguments.
Properties overriding order is the following:
1. `pom.xml` defined properties (located in the project root dir)
2. `~/.dhp/application.properties` defined properties
3. `${workflow.source.dir}/job.properties`
4. `job-override.properties` (located in the project root dir)
5. `maven -Dparam=value`
where the maven `-Dparam` property is overriding all the other ones.
Workflow definition requirements
====================
`workflow.source.dir` property should point to the following directory structure:
[${workflow.source.dir}]
|
|-job.properties (optional)
|
\-[oozie_app]
|
\-workflow.xml
This property can be set using maven `-D` switch.
`[oozie_app]` is the default directory name however it can be set to any value as soon as `oozieAppDir` property is
provided with directory name as value.
Sub-workflows are supported as well and sub-workflow directories should be nested within `[oozie_app]` directory.
Creating oozie installer step-by-step
=====================================
Automated oozie-installer steps are the following:
1. creating jar packages: `*.jar` and `*tests.jar` along with copying all dependencies in `target/dependencies`
2. reading properties from maven, `~/.dhp/application.properties`, `job.properties`, `job-override.properties`
3. invoking priming mechanism linking resources from import.txt file (currently resolving subworkflow resources)
4. assembling shell scripts for preparing Hadoop filesystem, uploading Oozie application and starting workflow
5. copying whole `${workflow.source.dir}` content to `target/${oozie.package.file.name}`
6. generating updated `job.properties` file in `target/${oozie.package.file.name}` based on maven,
`~/.dhp/application.properties`, `job.properties` and `job-override.properties`
7. creating `lib` directory (or multiple directories for sub-workflows for each nested directory) and copying jar packages
created at step (1) to each one of them
8. bundling whole `${oozie.package.file.name}` directory into single tar.gz package
Uploading oozie package and running workflow on cluster
=======================================================
In order to simplify deployment and execution process two dedicated profiles were introduced:
- `deploy`
- `run`
to be used along with `oozie-package` profile e.g. by providing `-Poozie-package,deploy,run` maven parameters.
The `deploy` profile supplements packaging process with:
1) uploading oozie-package via scp to `/home/${user.name}/oozie-packages` directory on `${dhp.hadoop.frontend.host.name}` machine
2) extracting uploaded package
3) uploading oozie content to hadoop cluster HDFS location defined in `oozie.wf.application.path` property (generated dynamically by maven build process, based on `${dhp.hadoop.frontend.user.name}` and `workflow.source.dir` properties)
The `run` profile introduces:
1) executing oozie application uploaded to HDFS cluster using `deploy` command. Triggers `run_workflow.sh` script providing runtime properties defined in `job.properties` file.
Notice: ssh access to frontend machine has to be configured on system level and it is preferable to set key-based authentication in order to simplify remote operations.

View File

@ -47,17 +47,14 @@ public class DispatchEntitiesSparkJob {
String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
boolean filterInvisible = Boolean.valueOf(parser.get("filterInvisible"));
boolean filterInvisible = Boolean.parseBoolean(parser.get("filterInvisible"));
log.info("filterInvisible: {}", filterInvisible);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
HdfsSupport.remove(outputPath, spark.sparkContext().hadoopConfiguration());
dispatchEntities(spark, inputPath, outputPath, filterInvisible);
});
spark -> dispatchEntities(spark, inputPath, outputPath, filterInvisible));
}
private static void dispatchEntities(
@ -72,7 +69,9 @@ public class DispatchEntitiesSparkJob {
String entityType = entry.getKey();
Class<?> clazz = entry.getValue();
final String entityPath = outputPath + "/" + entityType;
if (!entityType.equalsIgnoreCase("relation")) {
HdfsSupport.remove(entityPath, spark.sparkContext().hadoopConfiguration());
Dataset<Row> entityDF = spark
.read()
.schema(Encoders.bean(clazz).schema())
@ -91,7 +90,7 @@ public class DispatchEntitiesSparkJob {
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath + "/" + entityType);
.json(entityPath);
}
});
}

View File

@ -7,7 +7,7 @@ import java.util.regex.Pattern;
// https://researchguides.stevens.edu/c.php?g=442331&p=6577176
public class PmidCleaningRule {
public static final Pattern PATTERN = Pattern.compile("[1-9]{1,8}");
public static final Pattern PATTERN = Pattern.compile("0*(\\d{1,8})");
public static String clean(String pmid) {
String s = pmid
@ -17,7 +17,7 @@ public class PmidCleaningRule {
final Matcher m = PATTERN.matcher(s);
if (m.find()) {
return m.group();
return m.group(1);
}
return "";
}

View File

@ -9,10 +9,16 @@ class PmidCleaningRuleTest {
@Test
void testCleaning() {
// leading zeros are removed
assertEquals("1234", PmidCleaningRule.clean("01234"));
// tolerant to spaces in the middle
assertEquals("1234567", PmidCleaningRule.clean("0123 4567"));
// stop parsing at first not numerical char
assertEquals("123", PmidCleaningRule.clean("0123x4567"));
// invalid id leading to empty result
assertEquals("", PmidCleaningRule.clean("abc"));
// valid id with zeroes in the number
assertEquals("20794075", PmidCleaningRule.clean("20794075"));
}
}

View File

@ -81,7 +81,7 @@ case class SparkModel(conf: DedupConfig) {
MapDocumentUtil.truncateList(
MapDocumentUtil.getJPathList(fdef.getPath, documentContext, fdef.getType),
fdef.getSize
).toArray
).asScala
case Type.StringConcat =>
val jpaths = CONCAT_REGEX.split(fdef.getPath)

View File

@ -1,6 +1,24 @@
package eu.dnetlib.pace.util;
/*
* Diff Match and Patch
* Copyright 2018 The diff-match-patch Authors.
* https://github.com/google/diff-match-patch
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Diff Match and Patch
* Copyright 2018 The diff-match-patch Authors.

View File

@ -117,6 +117,11 @@ public class MapDocumentUtil {
return result;
}
if (type == Type.List && jresult instanceof List) {
((List<?>) jresult).forEach(x -> result.add(x.toString()));
return result;
}
if (jresult instanceof JSONArray) {
((JSONArray) jresult).forEach(it -> {
try {

View File

@ -0,0 +1,72 @@
# Action Management Framework
This module implements the oozie workflow for the integration of pre-built contents into the OpenAIRE Graph.
Such contents can be
* brand new, non-existing records to be introduced as nodes of the graph
* updates (or enrichment) for records that does exist in the graph (e.g. a new subject term for a publication)
* relations among existing nodes
The actionset contents are organised into logical containers, each of them can contain multiple versions contents and is characterised by
* a name
* an identifier
* the paths on HDFS where each version of the contents is stored
Each version is then characterised by
* the creation date
* the last update date
* the indication where it is the latest one or it is an expired version, candidate for garbage collection
## ActionSet serialization
Each actionset version contains records compliant to the graph internal data model, i.e. subclasses of `eu.dnetlib.dhp.schema.oaf.Oaf`,
defined in the external schemas module
```
<dependency>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>${dhp-schemas.artifact}</artifactId>
<version>${dhp-schemas.version}</version>
</dependency>
```
When the actionset contains a relationship, the model class to use is `eu.dnetlib.dhp.schema.oaf.Relation`, otherwise
when the actionset contains an entity, it is a `eu.dnetlib.dhp.schema.oaf.OafEntity` or one of its subclasses
`Datasource`, `Organization`, `Project`, `Result` (or one of its subclasses `Publication`, `Dataset`, etc...).
Then, each OpenAIRE Graph model class instance must be wrapped using the class `eu.dnetlib.dhp.schema.action.AtomicAction`, a generic
container that defines two attributes
* `T payload` the OpenAIRE Graph class instance containing the data;
* `Class<T> clazz` must contain the class whose instance is contained in the payload.
Each AtomicAction can be then serialised in JSON format using `com.fasterxml.jackson.databind.ObjectMapper` from
```
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${dhp.jackson.version}</version>
</dependency>
```
Then, the JSON serialization must be stored as a GZip compressed sequence file (`org.apache.hadoop.mapred.SequenceFileOutputFormat`).
As such, it contains a set of tuples, a key and a value defined as `org.apache.hadoop.io.Text` where
* the `key` must be set to the class canonical name contained in the `AtomicAction`;
* the `value` must be set to the AtomicAction JSON serialization.
The following snippet provides an example of how create an actionset version of Relation records:
```
rels // JavaRDD<Relation>
.map(relation -> new AtomicAction<Relation>(Relation.class, relation))
.mapToPair(
aa -> new Tuple2<>(new Text(aa.getClazz().getCanonicalName()),
new Text(OBJECT_MAPPER.writeValueAsString(aa))))
.saveAsHadoopFile(outputPath, Text.class, Text.class, SequenceFileOutputFormat.class, GzipCodec.class);
```

View File

@ -1,4 +1,9 @@
{
"ETHZ.UNIGENF": {
"openaire_id": "opendoar____::1400",
"datacite_name": "Uni Genf",
"official_name": "Archive ouverte UNIGE"
},
"GESIS.RKI": {
"openaire_id": "re3data_____::r3d100010436",
"datacite_name": "Forschungsdatenzentrum am Robert Koch Institut",

View File

@ -222,7 +222,7 @@ object BioDBToOAF {
def uniprotToOAF(input: String): List[Oaf] = {
implicit lazy val formats: DefaultFormats.type = org.json4s.DefaultFormats
lazy val json = parse(input)
val pid = (json \ "pid").extract[String]
val pid = (json \ "pid").extract[String].trim()
val d = new Dataset

View File

@ -1,15 +1,44 @@
{"pdb": "1CW0", "title": "crystal structure analysis of very short patch repair (vsr) endonuclease in complex with a duplex dna", "authors": ["S.E.Tsutakawa", "H.Jingami", "K.Morikawa"], "doi": "10.1016/S0092-8674(00)81550-0", "pmid": "10612397"}
{"pdb": "2CWW", "title": "crystal structure of thermus thermophilus ttha1280, a putative sam- dependent rna methyltransferase, in complex with s-adenosyl-l- homocysteine", "authors": ["A.A.Pioszak", "K.Murayama", "N.Nakagawa", "A.Ebihara", "S.Kuramitsu", "M.Shirouzu", "S.Yokoyama", "Riken Structural Genomics/proteomics Initiative (Rsgi)"], "doi": "10.1107/S1744309105029842", "pmid": "16511182"}
{"pdb": "6CWE", "title": "structure of alpha-gsa[8,6p] bound by cd1d and in complex with the va14vb8.2 tcr", "authors": ["J.Wang", "D.Zajonc"], "doi": null, "pmid": null}
{"pdb": "5CWS", "title": "crystal structure of the intact chaetomium thermophilum nsp1-nup49- nup57 channel nucleoporin heterotrimer bound to its nic96 nuclear pore complex attachment site", "authors": ["C.J.Bley", "S.Petrovic", "M.Paduch", "V.Lu", "A.A.Kossiakoff", "A.Hoelz"], "doi": "10.1126/SCIENCE.AAC9176", "pmid": "26316600"}
{"pdb": "5CWE", "title": "structure of cyp107l2 from streptomyces avermitilis with lauric acid", "authors": ["T.-V.Pham", "S.-H.Han", "J.-H.Kim", "D.-H.Kim", "L.-W.Kang"], "doi": null, "pmid": null}
{"pdb": "7CW4", "title": "acetyl-coa acetyltransferase from bacillus cereus atcc 14579", "authors": ["J.Hong", "K.J.Kim"], "doi": "10.1016/J.BBRC.2020.09.048", "pmid": "32972748"}
{"pdb": "2CWP", "title": "crystal structure of metrs related protein from pyrococcus horikoshii", "authors": ["K.Murayama", "M.Kato-Murayama", "M.Shirouzu", "S.Yokoyama", "Riken StructuralGenomics/proteomics Initiative (Rsgi)"], "doi": null, "pmid": null}
{"pdb": "2CW7", "title": "crystal structure of intein homing endonuclease ii", "authors": ["H.Matsumura", "H.Takahashi", "T.Inoue", "H.Hashimoto", "M.Nishioka", "S.Fujiwara", "M.Takagi", "T.Imanaka", "Y.Kai"], "doi": "10.1002/PROT.20858", "pmid": "16493661"}
{"pdb": "1CWU", "title": "brassica napus enoyl acp reductase a138g mutant complexed with nad+ and thienodiazaborine", "authors": ["A.Roujeinikova", "J.B.Rafferty", "D.W.Rice"], "doi": "10.1074/JBC.274.43.30811", "pmid": "10521472"}
{"pdb": "3CWN", "title": "escherichia coli transaldolase b mutant f178y", "authors": ["T.Sandalova", "G.Schneider", "A.Samland"], "doi": "10.1074/JBC.M803184200", "pmid": "18687684"}
{"pdb": "1CWL", "title": "human cyclophilin a complexed with 4 4-hydroxy-meleu cyclosporin", "authors": ["V.Mikol", "J.Kallen", "P.Taylor", "M.D.Walkinshaw"], "doi": "10.1006/JMBI.1998.2108", "pmid": "9769216"}
{"pdb": "3CW2", "title": "crystal structure of the intact archaeal translation initiation factor 2 from sulfolobus solfataricus .", "authors": ["E.A.Stolboushkina", "S.V.Nikonov", "A.D.Nikulin", "U.Blaesi", "D.J.Manstein", "R.V.Fedorov", "M.B.Garber", "O.S.Nikonov"], "doi": "10.1016/J.JMB.2008.07.039", "pmid": "18675278"}
{"pdb": "3CW9", "title": "4-chlorobenzoyl-coa ligase/synthetase in the thioester-forming conformation, bound to 4-chlorophenacyl-coa", "authors": ["A.S.Reger", "J.Cao", "R.Wu", "D.Dunaway-Mariano", "A.M.Gulick"], "doi": "10.1021/BI800696Y", "pmid": "18620418"}
{"pdb": "3CWU", "title": "crystal structure of an alka host/guest complex 2'-fluoro-2'-deoxy-1, n6-ethenoadenine:thymine base pair", "authors": ["B.R.Bowman", "S.Lee", "S.Wang", "G.L.Verdine"], "doi": "10.1016/J.STR.2008.04.012", "pmid": "18682218"}
{"pdb": "5CWF", "title": "crystal structure of de novo designed helical repeat protein dhr8", "authors": ["G.Bhabha", "D.C.Ekiert"], "doi": "10.1038/NATURE16162", "pmid": "26675729"}
{"classification": "Signaling protein", "pdb": "5NM4", "deposition_date": "2017-04-05", "title": "A2a adenosine receptor room-temperature structure determined by serial Femtosecond crystallography", "Keywords": ["Oom-temperature", " serial crystallography", " signaling protein"], "authors": ["T.weinert", "R.cheng", "D.james", "D.gashi", "P.nogly", "K.jaeger", "M.hennig", "", "J.standfuss"], "pmid": "28912485", "doi": "10.1038/S41467-017-00630-4"}
{"classification": "Oxidoreductase/oxidoreductase inhibitor", "pdb": "4KN3", "deposition_date": "2013-05-08", "title": "Structure of the y34ns91g double mutant of dehaloperoxidase from Amphitrite ornata with 2,4,6-trichlorophenol", "Keywords": ["Lobin", " oxygen storage", " peroxidase", " oxidoreductase", " oxidoreductase-", "Oxidoreductase inhibitor complex"], "authors": ["C.wang", "L.lovelace", "L.lebioda"], "pmid": "23952341", "doi": "10.1021/BI400627W"}
{"classification": "Transport protein", "pdb": "8HKM", "deposition_date": "2022-11-27", "title": "Ion channel", "Keywords": ["On channel", " transport protein"], "authors": ["D.h.jiang", "J.t.zhang"], "pmid": "37494189", "doi": "10.1016/J.CELREP.2023.112858"}
{"classification": "Signaling protein", "pdb": "6JT1", "deposition_date": "2019-04-08", "title": "Structure of human soluble guanylate cyclase in the heme oxidised State", "Keywords": ["Oluble guanylate cyclase", " signaling protein"], "authors": ["L.chen", "Y.kang", "R.liu", "J.-x.wu"], "pmid": "31514202", "doi": "10.1038/S41586-019-1584-6"}
{"classification": "Immune system", "pdb": "7OW6", "deposition_date": "2021-06-16", "title": "Crystal structure of a tcr in complex with hla-a*11:01 bound to kras G12d peptide (vvvgadgvgk)", "Keywords": ["La", " kras", " tcr", " immune system"], "authors": ["V.karuppiah", "R.a.robinson"], "doi": "10.1038/S41467-022-32811-1"}
{"classification": "Biosynthetic protein", "pdb": "5EQ8", "deposition_date": "2015-11-12", "title": "Crystal structure of medicago truncatula histidinol-phosphate Phosphatase (mthpp) in complex with l-histidinol", "Keywords": ["Istidine biosynthesis", " metabolic pathways", " dimer", " plant", "", "Biosynthetic protein"], "authors": ["M.ruszkowski", "Z.dauter"], "pmid": "26994138", "doi": "10.1074/JBC.M115.708727"}
{"classification": "De novo protein", "pdb": "8CWA", "deposition_date": "2022-05-18", "title": "Solution nmr structure of 8-residue rosetta-designed cyclic peptide D8.21 in cdcl3 with cis/trans switching (tc conformation, 53%)", "Keywords": ["Yclic peptide", " non natural amino acids", " cis/trans", " switch peptides", "", "De novo design", "Membrane permeability", "De novo protein"], "authors": ["T.a.ramelot", "R.tejero", "G.t.montelione"], "pmid": "36041435", "doi": "10.1016/J.CELL.2022.07.019"}
{"classification": "Hydrolase", "pdb": "3R6M", "deposition_date": "2011-03-21", "title": "Crystal structure of vibrio parahaemolyticus yeaz", "Keywords": ["Ctin/hsp70 nucleotide-binding fold", " bacterial resuscitation", " viable", "But non-culturable state", "Resuscitation promoting factor", "Ygjd", "", "Yjee", "Vibrio parahaemolyticus", "Hydrolase"], "authors": ["A.roujeinikova", "I.aydin"], "pmid": "21858042", "doi": "10.1371/JOURNAL.PONE.0023245"}
{"classification": "Hydrolase", "pdb": "2W5J", "deposition_date": "2008-12-10", "title": "Structure of the c14-rotor ring of the proton translocating Chloroplast atp synthase", "Keywords": ["Ydrolase", " chloroplast", " atp synthase", " lipid-binding", " cf(0)", " membrane", "", "Transport", "Formylation", "Energy transduction", "Hydrogen ion transport", "", "Ion transport", "Transmembrane", "Membrane protein"], "authors": ["M.vollmar", "D.schlieper", "M.winn", "C.buechner", "G.groth"], "pmid": "19423706", "doi": "10.1074/JBC.M109.006916"}
{"classification": "De novo protein", "pdb": "4GLU", "deposition_date": "2012-08-14", "title": "Crystal structure of the mirror image form of vegf-a", "Keywords": ["-protein", " covalent dimer", " cysteine knot protein", " growth factor", " de", "Novo protein"], "authors": ["K.mandal", "M.uppalapati", "D.ault-riche", "J.kenney", "J.lowitz", "S.sidhu", "", "S.b.h.kent"], "pmid": "22927390", "doi": "10.1073/PNAS.1210483109"}
{"classification": "Hydrolase/hydrolase inhibitor", "pdb": "3WYL", "deposition_date": "2014-09-01", "title": "Crystal structure of the catalytic domain of pde10a complexed with 5- Methoxy-3-(1-phenyl-1h-pyrazol-5-yl)-1-(3-(trifluoromethyl)phenyl) Pyridazin-4(1h)-one", "Keywords": ["Ydrolase-hydrolase inhibitor complex"], "authors": ["H.oki", "Y.hayano"], "pmid": "25384088", "doi": "10.1021/JM5013648"}
{"classification": "Isomerase", "pdb": "5BOR", "deposition_date": "2015-05-27", "title": "Structure of acetobacter aceti pure-s57c, sulfonate form", "Keywords": ["Cidophile", " pure", " purine biosynthesis", " isomerase"], "authors": ["K.l.sullivan", "T.j.kappock"]}
{"classification": "Hydrolase", "pdb": "1X0C", "deposition_date": "2005-03-17", "title": "Improved crystal structure of isopullulanase from aspergillus niger Atcc 9642", "Keywords": ["Ullulan", " glycoside hydrolase family 49", " glycoprotein", " hydrolase"], "authors": ["M.mizuno", "T.tonozuka", "A.yamamura", "Y.miyasaka", "H.akeboshi", "S.kamitori", "", "A.nishikawa", "Y.sakano"], "pmid": "18155243", "doi": "10.1016/J.JMB.2007.11.098"}
{"classification": "Oxidoreductase", "pdb": "7CUP", "deposition_date": "2020-08-23", "title": "Structure of 2,5-dihydroxypridine dioxygenase from pseudomonas putida Kt2440", "Keywords": ["On-heme dioxygenase", " oxidoreductase"], "authors": ["G.q.liu", "H.z.tang"]}
{"classification": "Ligase", "pdb": "1VCN", "deposition_date": "2004-03-10", "title": "Crystal structure of t.th. hb8 ctp synthetase complex with sulfate Anion", "Keywords": ["Etramer", " riken structural genomics/proteomics initiative", " rsgi", "", "Structural genomics", "Ligase"], "authors": ["M.goto", "Riken structural genomics/proteomics initiative (rsgi)"], "pmid": "15296735", "doi": "10.1016/J.STR.2004.05.013"}
{"classification": "Transferase/transferase inhibitor", "pdb": "6C9V", "deposition_date": "2018-01-28", "title": "Mycobacterium tuberculosis adenosine kinase bound to (2r,3s,4r,5r)-2- (hydroxymethyl)-5-(6-(4-phenylpiperazin-1-yl)-9h-purin-9-yl) Tetrahydrofuran-3,4-diol", "Keywords": ["Ucleoside analog", " complex", " inhibitor", " structural genomics", " psi-2", "", "Protein structure initiative", "Tb structural genomics consortium", "", "Tbsgc", "Transferase-transferase inhibitor complex"], "authors": ["R.a.crespo", "Tb structural genomics consortium (tbsgc)"], "pmid": "31002508", "doi": "10.1021/ACS.JMEDCHEM.9B00020"}
{"classification": "De novo protein", "pdb": "4LPY", "deposition_date": "2013-07-16", "title": "Crystal structure of tencon variant g10", "Keywords": ["Ibronectin type iii fold", " alternate scaffold", " de novo protein"], "authors": ["A.teplyakov", "G.obmolova", "G.l.gilliland"], "pmid": "24375666", "doi": "10.1002/PROT.24502"}
{"classification": "Isomerase", "pdb": "2Y88", "deposition_date": "2011-02-03", "title": "Crystal structure of mycobacterium tuberculosis phosphoribosyl Isomerase (variant d11n) with bound prfar", "Keywords": ["Romatic amino acid biosynthesis", " isomerase", " tim-barrel", " histidine", "Biosynthesis", "Tryptophan biosynthesis"], "authors": ["J.kuper", "A.v.due", "A.geerlof", "M.wilmanns"], "pmid": "21321225", "doi": "10.1073/PNAS.1015996108"}
{"classification": "Unknown function", "pdb": "1SR0", "deposition_date": "2004-03-22", "title": "Crystal structure of signalling protein from sheep(sps-40) at 3.0a Resolution using crystal grown in the presence of polysaccharides", "Keywords": ["Ignalling protein", " involution", " unknown function"], "authors": ["D.b.srivastava", "A.s.ethayathulla", "N.singh", "J.kumar", "S.sharma", "T.p.singh"]}
{"classification": "Dna binding protein", "pdb": "3RH2", "deposition_date": "2011-04-11", "title": "Crystal structure of a tetr-like transcriptional regulator (sama_0099) From shewanella amazonensis sb2b at 2.42 a resolution", "Keywords": ["Na/rna-binding 3-helical bundle", " structural genomics", " joint center", "For structural genomics", "Jcsg", "Protein structure initiative", "Psi-", "Biology", "Dna binding protein"], "authors": ["Joint center for structural genomics (jcsg)"]}
{"classification": "Transferase", "pdb": "2WK5", "deposition_date": "2009-06-05", "title": "Structural features of native human thymidine phosphorylase And in complex with 5-iodouracil", "Keywords": ["Lycosyltransferase", " developmental protein", " angiogenesis", "", "5-iodouracil", "Growth factor", "Enzyme kinetics", "", "Differentiation", "Disease mutation", "Thymidine", "Phosphorylase", "Chemotaxis", "Transferase", "Mutagenesis", "", "Polymorphism"], "authors": ["E.mitsiki", "A.c.papageorgiou", "S.iyer", "N.thiyagarajan", "S.h.prior", "", "D.sleep", "C.finnis", "K.r.acharya"], "pmid": "19555658", "doi": "10.1016/J.BBRC.2009.06.104"}
{"classification": "Hydrolase", "pdb": "3P9Y", "deposition_date": "2010-10-18", "title": "Crystal structure of the drosophila melanogaster ssu72-pctd complex", "Keywords": ["Hosphatase", " cis proline", " lmw ptp-like fold", " rna polymerase ii ctd", "", "Hydrolase"], "authors": ["J.w.werner-allen", "P.zhou"], "pmid": "21159777", "doi": "10.1074/JBC.M110.197129"}
{"classification": "Recombination/dna", "pdb": "6OEO", "deposition_date": "2019-03-27", "title": "Cryo-em structure of mouse rag1/2 nfc complex (dna1)", "Keywords": ["(d)j recombination", " dna transposition", " rag", " scid", " recombination", "", "Recombination-dna complex"], "authors": ["X.chen", "Y.cui", "Z.h.zhou", "W.yang", "M.gellert"], "pmid": "32015552", "doi": "10.1038/S41594-019-0363-2"}
{"classification": "Hydrolase", "pdb": "4ECA", "deposition_date": "1997-02-21", "title": "Asparaginase from e. coli, mutant t89v with covalently bound aspartate", "Keywords": ["Ydrolase", " acyl-enzyme intermediate", " threonine amidohydrolase"], "authors": ["G.j.palm", "J.lubkowski", "A.wlodawer"], "pmid": "8706862", "doi": "10.1016/0014-5793(96)00660-6"}
{"classification": "Transcription/protein binding", "pdb": "3UVX", "deposition_date": "2011-11-30", "title": "Crystal structure of the first bromodomain of human brd4 in complex With a diacetylated histone 4 peptide (h4k12ack16ac)", "Keywords": ["Romodomain", " bromodomain containing protein 4", " cap", " hunk1", " mcap", "", "Mitotic chromosome associated protein", "Peptide complex", "Structural", "Genomics consortium", "Sgc", "Transcription-protein binding complex"], "authors": ["P.filippakopoulos", "S.picaud", "T.keates", "E.ugochukwu", "F.von delft", "", "C.h.arrowsmith", "A.m.edwards", "J.weigelt", "C.bountra", "S.knapp", "Structural", "Genomics consortium (sgc)"], "pmid": "22464331", "doi": "10.1016/J.CELL.2012.02.013"}
{"classification": "Membrane protein", "pdb": "1TLZ", "deposition_date": "2004-06-10", "title": "Tsx structure complexed with uridine", "Keywords": ["Ucleoside transporter", " beta barrel", " uridine", " membrane", "Protein"], "authors": ["J.ye", "B.van den berg"], "pmid": "15272310", "doi": "10.1038/SJ.EMBOJ.7600330"}
{"classification": "Dna binding protein", "pdb": "7AZD", "deposition_date": "2020-11-16", "title": "Dna polymerase sliding clamp from escherichia coli with peptide 20 Bound", "Keywords": ["Ntibacterial drug", " dna binding protein"], "authors": ["C.monsarrat", "G.compain", "C.andre", "I.martiel", "S.engilberge", "V.olieric", "", "P.wolff", "K.brillet", "M.landolfo", "C.silva da veiga", "J.wagner", "G.guichard", "", "D.y.burnouf"], "pmid": "34806883", "doi": "10.1021/ACS.JMEDCHEM.1C00918"}
{"classification": "Transferase", "pdb": "5N3K", "deposition_date": "2017-02-08", "title": "Camp-dependent protein kinase a from cricetulus griseus in complex With fragment like molecule o-guanidino-l-homoserine", "Keywords": ["Ragment", " complex", " transferase", " serine threonine kinase", " camp", "", "Kinase", "Pka"], "authors": ["C.siefker", "A.heine", "G.klebe"]}
{"classification": "Biosynthetic protein", "pdb": "8H52", "deposition_date": "2022-10-11", "title": "Crystal structure of helicobacter pylori carboxyspermidine Dehydrogenase in complex with nadp", "Keywords": ["Arboxyspermidine dehydrogenase", " biosynthetic protein"], "authors": ["K.y.ko", "S.c.park", "S.y.cho", "S.i.yoon"], "pmid": "36283333", "doi": "10.1016/J.BBRC.2022.10.049"}
{"classification": "Metal binding protein", "pdb": "6DYC", "deposition_date": "2018-07-01", "title": "Co(ii)-bound structure of the engineered cyt cb562 variant, ch3", "Keywords": ["Esigned protein", " 4-helix bundle", " electron transport", " metal binding", "Protein"], "authors": ["F.a.tezcan", "J.rittle"], "pmid": "30778140", "doi": "10.1038/S41557-019-0218-9"}
{"classification": "Protein fibril", "pdb": "6A6B", "deposition_date": "2018-06-27", "title": "Cryo-em structure of alpha-synuclein fiber", "Keywords": ["Lpha-syn fiber", " parkinson disease", " protein fibril"], "authors": ["Y.w.li", "C.y.zhao", "F.luo", "Z.liu", "X.gui", "Z.luo", "X.zhang", "D.li", "C.liu", "X.li"], "pmid": "30065316", "doi": "10.1038/S41422-018-0075-X"}
{"classification": "Dna", "pdb": "7D5E", "deposition_date": "2020-09-25", "title": "Left-handed g-quadruplex containing two bulges", "Keywords": ["-quadruplex", " bulge", " dna", " left-handed"], "authors": ["P.das", "A.maity", "K.h.ngo", "F.r.winnerdy", "B.bakalar", "Y.mechulam", "E.schmitt", "", "A.t.phan"], "pmid": "33503265", "doi": "10.1093/NAR/GKAA1259"}
{"classification": "Transferase", "pdb": "3RSY", "deposition_date": "2011-05-02", "title": "Cellobiose phosphorylase from cellulomonas uda in complex with sulfate And glycerol", "Keywords": ["H94", " alpha barrel", " cellobiose phosphorylase", " disaccharide", "Phosphorylase", "Transferase"], "authors": ["A.van hoorebeke", "J.stout", "W.soetaert", "J.van beeumen", "T.desmet", "S.savvides"]}
{"classification": "Oxidoreductase", "pdb": "7MCI", "deposition_date": "2021-04-02", "title": "Mofe protein from azotobacter vinelandii with a sulfur-replenished Cofactor", "Keywords": ["Zotobacter vinelandii", " mofe-protein", " nitrogenase", " oxidoreductase"], "authors": ["W.kang", "C.lee", "Y.hu", "M.w.ribbe"], "doi": "10.1038/S41929-022-00782-7"}
{"classification": "Dna", "pdb": "1XUW", "deposition_date": "2004-10-26", "title": "Structural rationalization of a large difference in rna affinity Despite a small difference in chemistry between two 2'-o-modified Nucleic acid analogs", "Keywords": ["Na mimetic methylcarbamate amide analog", " dna"], "authors": ["R.pattanayek", "L.sethaphong", "C.pan", "M.prhavc", "T.p.prakash", "M.manoharan", "", "M.egli"], "pmid": "15547979", "doi": "10.1021/JA044637K"}
{"classification": "Lyase", "pdb": "7C0D", "deposition_date": "2020-05-01", "title": "Crystal structure of azospirillum brasilense l-2-keto-3-deoxyarabonate Dehydratase (hydroxypyruvate-bound form)", "Keywords": ["-2-keto-3-deoxyarabonate dehydratase", " lyase"], "authors": ["Y.watanabe", "S.watanabe"], "pmid": "32697085", "doi": "10.1021/ACS.BIOCHEM.0C00515"}
{"classification": "Signaling protein", "pdb": "5LYK", "deposition_date": "2016-09-28", "title": "Crystal structure of intracellular b30.2 domain of btn3a1 bound to Citrate", "Keywords": ["30.2", " butyrophilin", " signaling protein"], "authors": ["F.mohammed", "A.t.baker", "M.salim", "B.e.willcox"], "pmid": "28862425", "doi": "10.1021/ACSCHEMBIO.7B00694"}
{"classification": "Toxin", "pdb": "4IZL", "deposition_date": "2013-01-30", "title": "Structure of the n248a mutant of the panton-valentine leucocidin s Component from staphylococcus aureus", "Keywords": ["I-component leucotoxin", " staphylococcus aureus", " s component", "Leucocidin", "Beta-barrel pore forming toxin", "Toxin"], "authors": ["L.maveyraud", "B.j.laventie", "G.prevost", "L.mourey"], "pmid": "24643034", "doi": "10.1371/JOURNAL.PONE.0092094"}
{"classification": "Dna", "pdb": "6F3C", "deposition_date": "2017-11-28", "title": "The cytotoxic [pt(h2bapbpy)] platinum complex interacting with the Cgtacg hexamer", "Keywords": ["Rug-dna complex", " four-way junction", " dna"], "authors": ["M.ferraroni", "C.bazzicalupi", "P.gratteri", "F.papi"], "pmid": "31046177", "doi": "10.1002/ANIE.201814532"}
{"classification": "Signaling protein/inhibitor", "pdb": "4L5M", "deposition_date": "2013-06-11", "title": "Complexe of arno sec7 domain with the protein-protein interaction Inhibitor n-(4-hydroxy-2,6-dimethylphenyl)benzenesulfonamide at ph6.5", "Keywords": ["Ec-7domain", " signaling protein-inhibitor complex"], "authors": ["F.hoh", "J.rouhana"], "pmid": "24112024", "doi": "10.1021/JM4009357"}
{"classification": "Signaling protein", "pdb": "5I6J", "deposition_date": "2016-02-16", "title": "Crystal structure of srgap2 f-barx", "Keywords": ["Rgap2", " f-bar", " fx", " signaling protein"], "authors": ["M.sporny", "J.guez-haddad", "M.n.isupov", "Y.opatowsky"], "pmid": "28333212", "doi": "10.1093/MOLBEV/MSX094"}
{"classification": "Metal binding protein", "pdb": "1Q80", "deposition_date": "2003-08-20", "title": "Solution structure and dynamics of nereis sarcoplasmic calcium binding Protein", "Keywords": ["Ll-alpha", " metal binding protein"], "authors": ["G.rabah", "R.popescu", "J.a.cox", "Y.engelborghs", "C.t.craescu"], "pmid": "15819893", "doi": "10.1111/J.1742-4658.2005.04629.X"}
{"classification": "Transferase", "pdb": "1TW1", "deposition_date": "2004-06-30", "title": "Beta-1,4-galactosyltransferase mutant met344his (m344h-gal-t1) complex With udp-galactose and magnesium", "Keywords": ["Et344his mutation; closed conformation; mn binding", " transferase"], "authors": ["B.ramakrishnan", "E.boeggeman", "P.k.qasba"], "pmid": "15449940", "doi": "10.1021/BI049007+"}
{"classification": "Rna", "pdb": "2PN4", "deposition_date": "2007-04-23", "title": "Crystal structure of hepatitis c virus ires subdomain iia", "Keywords": ["Cv", " ires", " subdoamin iia", " rna", " strontium", " hepatitis"], "authors": ["Q.zhao", "Q.han", "C.r.kissinger", "P.a.thompson"], "pmid": "18391410", "doi": "10.1107/S0907444908002011"}

View File

@ -1,6 +1,36 @@
{"pid": "Q6GZX4", "dates": [{"date": "28-JUN-2011", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "19-JUL-2004", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 41."}], "title": "Putative transcription factor 001R;", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus."], "references": [{"PubMed": "15165820"}, {" DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": "Q6GZX3", "dates": [{"date": "28-JUN-2011", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "19-JUL-2004", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 42."}], "title": "Uncharacterized protein 002L;", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus."], "references": [{"PubMed": "15165820"}, {" DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": "Q197F8", "dates": [{"date": "16-JUN-2009", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "11-JUL-2006", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 27."}], "title": "Uncharacterized protein 002R;", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus."], "references": [{"PubMed": "16912294"}, {" DOI": "10.1128/jvi.00464-06"}]}
{"pid": "Q197F7", "dates": [{"date": "16-JUN-2009", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "11-JUL-2006", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 23."}], "title": "Uncharacterized protein 003L;", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus."], "references": [{"PubMed": "16912294"}, {" DOI": "10.1128/jvi.00464-06"}]}
{"pid": "Q6GZX2", "dates": [{"date": "28-JUN-2011", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "19-JUL-2004", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 36."}], "title": "Uncharacterized protein 3R;", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus."], "references": [{"PubMed": "15165820"}, {" DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": "Q6GZX1", "dates": [{"date": "28-JUN-2011", "date_info": " integrated into UniProtKB/Swiss-Prot."}, {"date": "19-JUL-2004", "date_info": " sequence version 1."}, {"date": "12-AUG-2020", "date_info": " entry version 34."}], "title": "Uncharacterized protein 004R;", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3).", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus."], "references": [{"PubMed": "15165820"}, {" DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZX4", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 43"}], "title": "Putative transcription factor 001R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZX3", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 45"}], "title": "Uncharacterized protein 002L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197F8", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2022-02-23", "date_info": "entry version 29"}], "title": "Uncharacterized protein 002R", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q197F7", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2020-08-12", "date_info": "entry version 23"}], "title": "Uncharacterized protein 003L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZX2", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 37"}], "title": "Uncharacterized protein 3R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZX1", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 38"}], "title": "Uncharacterized protein 004R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197F5", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2022-10-12", "date_info": "entry version 32"}], "title": "Uncharacterized protein 005L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZX0", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 47"}], "title": "Uncharacterized protein 005R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q91G88", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2001-12-01", "date_info": "sequence version 1"}, {"date": "2023-06-28", "date_info": "entry version 53"}], "title": "Putative KilA-N domain-containing protein 006L", "organism_species": "Invertebrate iridescent virus 6 (IIV-6) (Chilo iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Iridovirus"], "references": [{"PubMed": "17239238"}, {"DOI": "10.1186/1743-422x-4-11"}]}
{"pid": " Q6GZW9", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 34"}], "title": "Uncharacterized protein 006R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZW8", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 32"}], "title": "Uncharacterized protein 007R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197F3", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2023-02-22", "date_info": "entry version 28"}], "title": "Uncharacterized protein 007R", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q197F2", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2022-02-23", "date_info": "entry version 22"}], "title": "Uncharacterized protein 008L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZW6", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 67"}], "title": "Putative helicase 009L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q91G85", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2001-12-01", "date_info": "sequence version 1"}, {"date": "2023-02-22", "date_info": "entry version 38"}], "title": "Uncharacterized protein 009R", "organism_species": "Invertebrate iridescent virus 6 (IIV-6) (Chilo iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Iridovirus"], "references": [{"PubMed": "17239238"}, {"DOI": "10.1186/1743-422x-4-11"}]}
{"pid": " Q6GZW5", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 37"}], "title": "Uncharacterized protein 010R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197E9", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2023-02-22", "date_info": "entry version 28"}], "title": "Uncharacterized protein 011L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZW4", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 37"}], "title": "Uncharacterized protein 011R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZW3", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 35"}], "title": "Uncharacterized protein 012L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197E7", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2023-02-22", "date_info": "entry version 37"}], "title": "Uncharacterized protein IIV3-013L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZW2", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 30"}], "title": "Uncharacterized protein 013R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZW1", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 35"}], "title": "Uncharacterized protein 014R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZW0", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 50"}], "title": "Uncharacterized protein 015R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZV8", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 35"}], "title": "Uncharacterized protein 017L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZV7", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 33"}], "title": "Uncharacterized protein 018L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZV6", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 87"}], "title": "Putative serine/threonine-protein kinase 019R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZV5", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 40"}], "title": "Uncharacterized protein 020R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZV4", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 35"}], "title": "Uncharacterized protein 021L", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197D8", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2022-12-14", "date_info": "entry version 35"}], "title": "Transmembrane protein 022L", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZV2", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 33"}], "title": "Uncharacterized protein 023R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197D7", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2023-02-22", "date_info": "entry version 25"}], "title": "Uncharacterized protein 023R", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q6GZV1", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 37"}], "title": "Uncharacterized protein 024R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q197D5", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2006-07-11", "date_info": "sequence version 1"}, {"date": "2022-10-12", "date_info": "entry version 24"}], "title": "Uncharacterized protein 025R", "organism_species": "Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Chloriridovirus"], "references": [{"PubMed": "16912294"}, {"DOI": "10.1128/jvi.00464-06"}]}
{"pid": " Q91G70", "dates": [{"date": "2009-06-16", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2001-12-01", "date_info": "sequence version 1"}, {"date": "2020-08-12", "date_info": "entry version 32"}], "title": "Uncharacterized protein 026R", "organism_species": "Invertebrate iridescent virus 6 (IIV-6) (Chilo iridescent virus)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Betairidovirinae", "Iridovirus"], "references": [{"PubMed": "17239238"}, {"DOI": "10.1186/1743-422x-4-11"}]}
{"pid": " Q6GZU9", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 49"}], "title": "Uncharacterized protein 027R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}
{"pid": " Q6GZU8", "dates": [{"date": "2011-06-28", "date_info": "integrated into UniProtKB/Swiss-Prot"}, {"date": "2004-07-19", "date_info": "sequence version 1"}, {"date": "2023-09-13", "date_info": "entry version 55"}], "title": "Uncharacterized protein 028R", "organism_species": "Frog virus 3 (isolate Goorha) (FV-3)", "subjects": ["Viruses", "Varidnaviria", "Bamfordvirae", "Nucleocytoviricota", "Megaviricetes", "Pimascovirales", "Iridoviridae", "Alphairidovirinae", "Ranavirus"], "references": [{"PubMed": "15165820"}, {"DOI": "10.1016/j.virol.2004.02.019"}]}

View File

@ -2,7 +2,9 @@
package eu.dnetlib.dhp.broker.oa.util;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.commons.io.IOUtils;
import org.apache.spark.sql.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@ -27,10 +29,14 @@ public class TrustUtils {
static {
mapper = new ObjectMapper();
try {
dedupConfig = mapper
.readValue(
DedupConfig.class.getResourceAsStream("/eu/dnetlib/dhp/broker/oa/dedupConfig/dedupConfig.json"),
DedupConfig.class);
dedupConfig = DedupConfig
.load(
IOUtils
.toString(
DedupConfig.class
.getResourceAsStream("/eu/dnetlib/dhp/broker/oa/dedupConfig/dedupConfig.json"),
StandardCharsets.UTF_8));
deduper = new SparkDeduper(dedupConfig);
} catch (final IOException e) {
log.error("Error loading dedupConfig, e");
@ -57,7 +63,7 @@ public class TrustUtils {
return TrustUtils.rescale(score, threshold);
} catch (final Exception e) {
log.error("Error computing score between results", e);
return BrokerConstants.MIN_TRUST;
throw new RuntimeException(e);
}
}

View File

@ -1,13 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>dhp-workflows</artifactId>
<groupId>eu.dnetlib.dhp</groupId>
<version>1.2.5-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>dhp-distcp</artifactId>
</project>

View File

@ -1,18 +0,0 @@
<configuration>
<property>
<name>jobTracker</name>
<value>yarnRM</value>
</property>
<property>
<name>nameNode</name>
<value>hdfs://nameservice1</value>
</property>
<property>
<name>sourceNN</name>
<value>webhdfs://namenode2.hadoop.dm.openaire.eu:50071</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
</configuration>

View File

@ -1,46 +0,0 @@
<workflow-app name="distcp" xmlns="uri:oozie:workflow:0.5">
<parameters>
<property>
<name>sourceNN</name>
<description>the source name node</description>
</property>
<property>
<name>sourcePath</name>
<description>the source path</description>
</property>
<property>
<name>targetPath</name>
<description>the target path</description>
</property>
<property>
<name>hbase_dump_distcp_memory_mb</name>
<value>6144</value>
<description>memory for distcp action copying InfoSpace dump from remote cluster</description>
</property>
<property>
<name>hbase_dump_distcp_num_maps</name>
<value>1</value>
<description>maximum number of simultaneous copies of InfoSpace dump from remote location</description>
</property>
</parameters>
<start to="distcp"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="distcp">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<arg>-Dmapreduce.map.memory.mb=${hbase_dump_distcp_memory_mb}</arg>
<arg>-pb</arg>
<arg>-m ${hbase_dump_distcp_num_maps}</arg>
<arg>${sourceNN}/${sourcePath}</arg>
<arg>${nameNode}/${targetPath}</arg>
</distcp>
<ok to="End" />
<error to="Kill" />
</action>
<end name="End"/>
</workflow-app>

View File

@ -5,6 +5,7 @@ import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;

View File

@ -14,6 +14,7 @@ import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
@ -84,19 +85,26 @@ public class SparkCountryPropagationJob {
Dataset<R> res = readPath(spark, sourcePath, resultClazz);
log.info("Reading prepared info: {}", preparedInfoPath);
Dataset<ResultCountrySet> prepared = spark
final Dataset<Row> preparedInfoRaw = spark
.read()
.json(preparedInfoPath)
.as(Encoders.bean(ResultCountrySet.class));
res
.joinWith(prepared, res.col("id").equalTo(prepared.col("resultId")), "left_outer")
.map(getCountryMergeFn(), Encoders.bean(resultClazz))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
.json(preparedInfoPath);
if (!preparedInfoRaw.isEmpty()) {
final Dataset<ResultCountrySet> prepared = preparedInfoRaw.as(Encoders.bean(ResultCountrySet.class));
res
.joinWith(prepared, res.col("id").equalTo(prepared.col("resultId")), "left_outer")
.map(getCountryMergeFn(), Encoders.bean(resultClazz))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
} else {
res
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
}
private static <R extends Result> MapFunction<Tuple2<R, ResultCountrySet>, R> getCountryMergeFn() {

View File

@ -260,6 +260,7 @@
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=${spark2MaxExecutors}
--conf spark.sql.shuffle.partitions=10000
</spark-opts>
<arg>--preparedInfoPath</arg><arg>${workingDir}/preparedInfo/mergedCommunityAssoc</arg>
<arg>--sourcePath</arg><arg>${sourcePath}/publication</arg>
@ -289,6 +290,7 @@
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=${spark2MaxExecutors}
--conf spark.sql.shuffle.partitions=5000
</spark-opts>
<arg>--preparedInfoPath</arg><arg>${workingDir}/preparedInfo/mergedCommunityAssoc</arg>
<arg>--sourcePath</arg><arg>${sourcePath}/dataset</arg>
@ -318,6 +320,7 @@
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=${spark2MaxExecutors}
--conf spark.sql.shuffle.partitions=2000
</spark-opts>
<arg>--preparedInfoPath</arg><arg>${workingDir}/preparedInfo/mergedCommunityAssoc</arg>
<arg>--sourcePath</arg><arg>${sourcePath}/otherresearchproduct</arg>
@ -347,6 +350,7 @@
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=${spark2MaxExecutors}
--conf spark.sql.shuffle.partitions=1000
</spark-opts>
<arg>--preparedInfoPath</arg><arg>${workingDir}/preparedInfo/mergedCommunityAssoc</arg>
<arg>--sourcePath</arg><arg>${sourcePath}/software</arg>

View File

@ -49,7 +49,7 @@ public class DownloadCsvTest {
@Test
void getUnibiFileTest() throws CollectorException, IOException, ClassNotFoundException {
String fileURL = "https://pub.uni-bielefeld.de/download/2944717/2944718/issn_gold_oa_version_4.csv";
String fileURL = "https://pub.uni-bielefeld.de/download/2944717/2944718/issn_gold_oa_version_5.csv";
final String outputFile = workingDir + "/unibi_gold.json";
new DownloadCSV()

View File

@ -1067,6 +1067,28 @@ class MappersTest {
System.out.println("***************");
}
@Test
public void testD4ScienceTraining() throws IOException {
final String xml = IOUtils
.toString(Objects.requireNonNull(getClass().getResourceAsStream("d4science-1-training.xml")));
final List<Oaf> list = new OdfToOafMapper(vocs, false, true).processMdRecord(xml);
final OtherResearchProduct trainingMaterial = (OtherResearchProduct) list.get(0);
System.out.println("***************");
System.out.println(new ObjectMapper().writeValueAsString(trainingMaterial));
System.out.println("***************");
}
@Test
public void testD4ScienceDataset() throws IOException {
final String xml = IOUtils
.toString(Objects.requireNonNull(getClass().getResourceAsStream("d4science-2-dataset.xml")));
final List<Oaf> list = new OdfToOafMapper(vocs, false, true).processMdRecord(xml);
final Dataset trainingMaterial = (Dataset) list.get(0);
System.out.println("***************");
System.out.println(new ObjectMapper().writeValueAsString(trainingMaterial));
System.out.println("***************");
}
@Test
void testNotWellFormed() throws IOException {
final String xml = IOUtils

View File

@ -0,0 +1,93 @@
<?xml version="1.0" encoding="UTF-8"?>
<oai:record xmlns:dr="http://www.driver-repository.eu/namespace/dr"
xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:oaf="http://namespace.openaire.eu/oaf" xmlns:oai="http://www.openarchives.org/OAI/2.0/">
<oai:header>
<dri:objIdentifier>alessia_____::104c2d4ba8878c16fa824dce5b1bea57</dri:objIdentifier>
<dri:recordIdentifier>12d8f77e-d66f-46f5-8d88-af7db23bc4c9</dri:recordIdentifier>
<dri:dateOfCollection>2023-09-08T10:12:35.864+02:00</dri:dateOfCollection>
<oaf:datasourceprefix>alessia_____</oaf:datasourceprefix>
<dr:dateOfTransformation>2023-09-08T11:31:45.692+02:00</dr:dateOfTransformation>
</oai:header>
<oai:metadata>
<datacite:resource
xmlns:datacite="http://datacite.org/schema/kernel-4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<datacite:identifier identifierType="URL">http://data.d4science.org/ctlg/ResourceCatalogue/visual_analytics_for_data_scientists</datacite:identifier>
<datacite:alternateIdentifiers/>
<datacite:creators>
<datacite:creator>
<datacite:creatorName>BRAGHIERI MARCO</datacite:creatorName>
</datacite:creator>
</datacite:creators>
<datacite:titles>
<datacite:title>Visual Analytics for Data Scientists</datacite:title>
</datacite:titles>
<datacite:publisher>SoBigData++</datacite:publisher>
<datacite:publicationYear/>
<datacite:dates>
<datacite:date dateType="Issued"/>
</datacite:dates>
<datacite:resourceType resourceTypeGeneral="TrainingMaterial">TrainingMaterial</datacite:resourceType>
<datacite:descriptions>
<datacite:description descriptionType="Abstract">Participants to this module shall
- Learn the principles and rules underlying the design of visual data
representations and human-computer interactions
- Understand, adapt and apply representative visual analytics methods and systems for diverse types
of data and problems
- Analyse and evaluate the structure and properties
of data to select or devise appropriate methods for data exploration
- Combine visualization, interactive techniques, and computational
processing to develop practical data analysis for problem solving
(This teaching material on Visual Analytics for Data Scientists is part of a MSc module at City University London).
The author did not intend to violate any copyright on figures or content. In case you are the legal owner of any copyrighted content, please contact info@sobigdata.eu and we will immediately remove it</datacite:description>
</datacite:descriptions>
<datacite:subjects>
<datacite:subject>Visual analytics</datacite:subject>
</datacite:subjects>
<datacite:formats>
<datacite:format>Slides</datacite:format>
<datacite:format>Other</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>PDF</datacite:format>
<datacite:format>ZIP</datacite:format>
</datacite:formats>
</datacite:resource>
<oaf:accessrights>OPEN</oaf:accessrights>
<dr:CobjCategory type="other">0010</dr:CobjCategory>
<oaf:dateAccepted/>
<oaf:hostedBy id="alessia_____::alessia" name="Alessia"/>
<oaf:collectedFrom id="alessia_____::alessia" name="Alessia"/>
<oaf:license>other-open</oaf:license>
<oaf:projectid>corda__h2020::871042</oaf:projectid>
</oai:metadata>
<about xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:prov="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="true" harvestDate="2023-09-08T10:12:35.864+02:00">
<baseURL>https%3A%2F%2Fapi.d4science.org%2Fcatalogue%2Fitems</baseURL>
<identifier/>
<datestamp/>
<metadataNamespace/>
</originDescription>
</provenance>
<oaf:datainfo>
<oaf:inferred>false</oaf:inferred>
<oaf:deletedbyinference>false</oaf:deletedbyinference>
<oaf:trust>0.9</oaf:trust>
<oaf:inferenceprovenance/>
<oaf:provenanceaction classid="sysimport:crosswalk"
classname="Harvested" schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions"/>
</oaf:datainfo>
</about>
</oai:record>

View File

@ -0,0 +1,72 @@
<?xml version="1.0" encoding="UTF-8"?>
<oai:record xmlns:dr="http://www.driver-repository.eu/namespace/dr"
xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:oaf="http://namespace.openaire.eu/oaf" xmlns:oai="http://www.openarchives.org/OAI/2.0/">
<oai:header>
<dri:objIdentifier>alessia_____::028879484548f4e1c630e1c503e35231</dri:objIdentifier>
<dri:recordIdentifier>4fed018e-c2ff-4afa-b7b5-1ca1beebf850</dri:recordIdentifier>
<dri:dateOfCollection>2023-09-08T12:14:27.615+02:00</dri:dateOfCollection>
<oaf:datasourceprefix>alessia_____</oaf:datasourceprefix>
<dr:dateOfTransformation>2023-09-08T12:14:51.7+02:00</dr:dateOfTransformation>
</oai:header>
<oai:metadata>
<datacite:resource
xmlns:datacite="http://datacite.org/schema/kernel-4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<datacite:identifier identifierType="URL">http://data.d4science.org/ctlg/ResourceCatalogue/city-to-city_migration</datacite:identifier>
<datacite:alternateIdentifiers>
<datacite:alternateIdentifier type="URL"/>
</datacite:alternateIdentifiers>
<datacite:creators>
<datacite:creator>
<datacite:creatorName>Pappalardo, Luca</datacite:creatorName>
<datacite:affiliation/>
<datacite:nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org">0000-0002-1547-6007</datacite:nameIdentifier>
</datacite:creator>
</datacite:creators>
<datacite:titles>
<datacite:title>City-to-city migration</datacite:title>
</datacite:titles>
<datacite:publisher>SoBigData++</datacite:publisher>
<datacite:publicationYear/>
<datacite:dates>
<datacite:date dateType="Issued">2018-02-15</datacite:date>
</datacite:dates>
<datacite:resourceType resourceTypeGeneral="Dataset">Dataset</datacite:resourceType>
<datacite:descriptions>
<datacite:description descriptionType="Abstract">Census data recording the migration of people between metropolitan areas in
the US</datacite:description>
</datacite:descriptions>
<datacite:subjects>
<datacite:subject>Human Mobility data</datacite:subject>
</datacite:subjects>
<datacite:formats/>
</datacite:resource>
<oaf:accessrights>OPEN</oaf:accessrights>
<dr:CobjCategory type="dataset">0021</dr:CobjCategory>
<oaf:dateAccepted>2018-02-15</oaf:dateAccepted>
<oaf:hostedBy id="alessia_____::alessia" name="Alessia"/>
<oaf:collectedFrom id="alessia_____::alessia" name="Alessia"/>
<oaf:license>AFL-3.0</oaf:license>
<oaf:projectid>corda__h2020::871042</oaf:projectid>
</oai:metadata>
<about xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:prov="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="true" harvestDate="2023-09-08T12:14:27.615+02:00">
<baseURL>https%3A%2F%2Fapi.d4science.org%2Fcatalogue%2Fitems</baseURL>
<identifier/>
<datestamp/>
<metadataNamespace/>
</originDescription>
</provenance>
<oaf:datainfo>
<oaf:inferred>false</oaf:inferred>
<oaf:deletedbyinference>false</oaf:deletedbyinference>
<oaf:trust>0.9</oaf:trust>
<oaf:inferenceprovenance/>
<oaf:provenanceaction classid="sysimport:crosswalk"
classname="Harvested" schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions"/>
</oaf:datainfo>
</about>
</oai:record>

View File

@ -24,10 +24,7 @@ import eu.dnetlib.dhp.oa.provision.model.RelatedEntity;
import eu.dnetlib.dhp.oa.provision.model.RelatedEntityWrapper;
import eu.dnetlib.dhp.oa.provision.utils.ContextMapper;
import eu.dnetlib.dhp.oa.provision.utils.XmlRecordFactory;
import eu.dnetlib.dhp.schema.oaf.Datasource;
import eu.dnetlib.dhp.schema.oaf.Project;
import eu.dnetlib.dhp.schema.oaf.Publication;
import eu.dnetlib.dhp.schema.oaf.Relation;
import eu.dnetlib.dhp.schema.oaf.*;
public class XmlRecordFactoryTest {
@ -196,4 +193,51 @@ public class XmlRecordFactoryTest {
assertEquals("dnet:pid_types", ((Element) pids.get(0)).attribute("schemeid").getValue());
assertEquals("dnet:pid_types", ((Element) pids.get(0)).attribute("schemename").getValue());
}
@Test
public void testD4ScienceTraining() throws DocumentException, IOException {
final ContextMapper contextMapper = new ContextMapper();
final XmlRecordFactory xmlRecordFactory = new XmlRecordFactory(contextMapper, false,
XmlConverterJob.schemaLocation);
final OtherResearchProduct p = OBJECT_MAPPER
.readValue(
IOUtils.toString(getClass().getResourceAsStream("d4science-1-training.json")),
OtherResearchProduct.class);
final String xml = xmlRecordFactory.build(new JoinedEntity<>(p));
assertNotNull(xml);
final Document doc = new SAXReader().read(new StringReader(xml));
assertNotNull(doc);
System.out.println(doc.asXML());
}
@Test
public void testD4ScienceDataset() throws DocumentException, IOException {
final ContextMapper contextMapper = new ContextMapper();
final XmlRecordFactory xmlRecordFactory = new XmlRecordFactory(contextMapper, false,
XmlConverterJob.schemaLocation);
final OtherResearchProduct p = OBJECT_MAPPER
.readValue(
IOUtils.toString(getClass().getResourceAsStream("d4science-2-dataset.json")),
OtherResearchProduct.class);
final String xml = xmlRecordFactory.build(new JoinedEntity<>(p));
assertNotNull(xml);
final Document doc = new SAXReader().read(new StringReader(xml));
assertNotNull(doc);
System.out.println(doc.asXML());
}
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,586 @@
<record>
<result xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<dri:objIdentifier>doi_dedup___::e225555a08a082ad8f53f179bc59c5d0</dri:objIdentifier>
<dri:dateOfCollection>2023-01-27T05:32:10Z</dri:dateOfCollection>
</header>
<metadata>
<oaf:entity xmlns:oaf="http://namespace.openaire.eu/oaf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://namespace.openaire.eu/oaf http://namespace.openaire.eu/oaf http://www.openaire.eu/schema/0.2/oaf-0.2.xsd">
<oaf:result>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2"/>
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993"/>
<collectedfrom name="ORCID" id="openaire____::806360c771262b4d6770e7cdf04b5c5a"/>
<collectedfrom name="Microsoft Academic Graph" id="openaire____::5f532a3fc4f1ea403f37070f59a7a53a"/>
<collectedfrom name="Oxford University Research Archive"
id="opendoar____::2290a7385ed77cc5592dc2153229f082"/>
<collectedfrom name="PubMed Central" id="opendoar____::eda80a3d5b344bc40f3bc04f65b7a357"/>
<collectedfrom name="Europe PubMed Central" id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c"/>
<originalId>10.1098/rsta.2020.0257</originalId>
<originalId>50|doiboost____::e225555a08a082ad8f53f179bc59c5d0</originalId>
<originalId>3211056089</originalId>
<originalId>50|od______1064::83eb0f76b60445d72bb7428a1b68ef1a</originalId>
<originalId>oai:ora.ox.ac.uk:uuid:9fc4563a-07e1-41d1-8b99-31ce2f8ac027</originalId>
<originalId>50|od_______267::6d978e42c57dfc79d61a84ab5be28cb8</originalId>
<originalId>oai:pubmedcentral.nih.gov:8543046</originalId>
<originalId>od_______267::6d978e42c57dfc79d61a84ab5be28cb8</originalId>
<originalId>34689630</originalId>
<originalId>PMC8543046</originalId>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types">10.1098/rsta.2020.0257
</pid>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:crosswalk:repository"
trust="0.9">PMC8543046
</pid>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types" schemename="dnet:pid_types"
inferred="false" provenanceaction="sysimport:crosswalk:repository" trust="0.9">34689630
</pid>
<measure id="influence" score="5.534974E-9" class="C"/>
<measure id="popularity" score="2.0811036E-8" class="C"/>
<measure id="influence_alt" score="8" class="C"/>
<measure id="popularity_alt" score="2.736" class="C"/>
<measure id="impulse" score="8" class="C"/>
<measure id="downloads" count="1"/>
<measure id="views" count="4"/>
<title classid="alternative title" classname="alternative title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">A completely automated pipeline for 3D reconstruction of
human heart from 2D cine magnetic resonance slices.
</title>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">A completely automated pipeline for 3D reconstruction of
human heart from 2D cine magnetic resonance slices
</title>
<bestaccessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<creator rank="8" URL="https://academic.microsoft.com/#/detail/2103042895">Vicente Grau</creator>
<creator rank="1" orcid="0000-0001-8198-5128"
URL="https://academic.microsoft.com/#/detail/2693349397">Abhirup Banerjee
</creator>
<creator rank="3" URL="https://academic.microsoft.com/#/detail/1425738153">Ernesto Zacur</creator>
<creator rank="6" URL="https://academic.microsoft.com/#/detail/2043191934">Robin P. Choudhury
</creator>
<creator rank="7" URL="https://academic.microsoft.com/#/detail/2166186014">Blanca Rodriguez
</creator>
<creator rank="2" URL="https://academic.microsoft.com/#/detail/2889449215">Julia Camps</creator>
<creator rank="5" URL="https://academic.microsoft.com/#/detail/264390263">Yoram Rudy</creator>
<creator rank="4" URL="https://academic.microsoft.com/#/detail/2713879776">Christopher M. Andrews
</creator>
<country classid="GB" classname="United Kingdom" schemeid="dnet:countries"
schemename="dnet:countries"/>
<dateofacceptance>2021-10-01</dateofacceptance>
<description>Cardiac magnetic resonance (CMR) imaging is a valuable modality in the diagnosis and
characterization of cardiovascular diseases, since it can identify abnormalities in structure
and function of the myocardium non-invasively and without the need for ionizing radiation.
However, in clinical practice, it is commonly acquired as a collection of separated and
independent 2D image planes, which limits its accuracy in 3D analysis. This paper presents a
completely automated pipeline for generating patient-specific 3D biventricular heart models from
cine magnetic resonance (MR) slices. Our pipeline automatically selects the relevant cine MR
images, segments them using a deep learning-based method to extract the heart contours, and
aligns the contours in 3D space correcting possible misalignments due to breathing or subject
motion first using the intensity and contours information from the cine data and next with the
help of a statistical shape model. Finally, the sparse 3D representation of the contours is used
to generate a smooth 3D biventricular mesh. The computational pipeline is applied and evaluated
in a CMR dataset of 20 healthy subjects. Our results show an average reduction of misalignment
artefacts from 1.82±1.60mm to 0.72±0.73mm over 20 subjects, in terms of distance from the
final reconstructed mesh. The high-resolution 3D biventricular meshes obtained with our
computational pipeline are used for simulations of electrical activation patterns, showing
agreement with non-invasive electrocardiographic imaging. The automatic methodologies presented
here for patient-specific MR imaging-based 3D biventricular representations contribute to the
efficient realization of precision medicine, enabling the enhanced interpretability of clinical
data, the digital twin vision through patient-specific image-based modelling and simulation, and
augmented reality applications.
This article is part of the theme issue Advanced computation in cardiovascular physiology: new
challenges and opportunities.
</description>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">General Physics and Astronomy
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">General Engineering
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">General Mathematics
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Pipeline (computing)
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Cine mri
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Structure and function
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Cardiac magnetic resonance
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Magnetic resonance imaging
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.5338496">medicine.diagnostic_test
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.5338496">medicine
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Human heart
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Modality (humancomputer interaction)
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">3D reconstruction
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Computer science
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Nuclear magnetic resonance
</subject>
<subject classid="SDG" classname="Sustainable Development Goals"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:sdg">3. Good health
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03 medical and health sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0302 clinical medicine
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030218 Nuclear Medicine &amp;
Medical Imaging
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03021801 Radiology/Image
segmentation
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03021801 Radiology/Image
segmentation - deep learning/datum
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030204 Cardiovascular System
&amp; Hematology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03020401 Aging-associated
diseases/Heart diseases
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030217 Neurology &amp;
Neurosurgery
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03021701 Brain/Neural circuits
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">Articles
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">Research Articles
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">cardiac mesh reconstruction
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">cine MRI
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">misalignment correction
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">electrophysiological
simulation
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">ECGI
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Heart
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Humans
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Imaging, Three-Dimensional
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Magnetic Resonance Imaging
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Magnetic Resonance Imaging, Cine
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">Magnetic Resonance Spectroscopy
</subject>
<language classid="und" classname="Undetermined" schemeid="dnet:languages"
schemename="dnet:languages"/>
<relevantdate classid="created" classname="created" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date">2021-10-25
</relevantdate>
<relevantdate classid="published-online" classname="published-online" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date">2021-10-25
</relevantdate>
<relevantdate classid="published-print" classname="published-print" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date">2021-12-13
</relevantdate>
<relevantdate classid="UNKNOWN" classname="UNKNOWN" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">2021-01-01
</relevantdate>
<relevantdate classid="available" classname="available" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">2023-01-05
</relevantdate>
<relevantdate classid="Accepted" classname="Accepted" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">2021-05-28
</relevantdate>
<relevantdate classid="issued" classname="issued" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">2023-01-05
</relevantdate>
<publisher>The Royal Society</publisher>
<source>Crossref</source>
<source/>
<source>Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
</source>
<resulttype classid="publication" classname="publication" schemeid="dnet:result_typologies"
schemename="dnet:result_typologies"/>
<resourcetype classid="UNKNOWN" classname="UNKNOWN" schemeid="dnet:dataCite_resource"
schemename="dnet:dataCite_resource"/>
<journal issn="1364-503X" eissn="1471-2962" vol="379">Philosophical Transactions of the Royal
Society A: Mathematical, Physical and Engineering Sciences
</journal>
<context id="dth" label="Digital Twins in Health" type="community"></context>
<context id="EC" label="European Commission" type="funding">
<category id="EC::H2020" label="Horizon 2020 Framework Programme">
<concept id="EC::H2020::RIA" label="Research and Innovation action"/>
</category>
</context>
<datainfo>
<inferred>true</inferred>
<deletedbyinference>false</deletedbyinference>
<trust>0.8</trust>
<inferenceprovenance>dedup-result-decisiontree-v3</inferenceprovenance>
<provenanceaction classid="sysimport:dedup" classname="Inferred by OpenAIRE"
schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions"/>
</datainfo>
<rels>
<rel inferred="false" trust="0.9" inferenceprovenance="" provenanceaction="sysimport:actionset">
<to class="hasAuthorInstitution" scheme="dnet:result_organization_relations"
type="organization">openorgs____::6a7b1b4c40a067a1f209de6867fe094d
</to>
<country classid="GB" classname="United Kingdom" schemeid="dnet:countries"
schemename="dnet:countries"/>
<legalname>University of Oxford</legalname>
<legalshortname>University of Oxford</legalshortname>
</rel>
<rel inferred="false" trust="0.9" inferenceprovenance="" provenanceaction="sysimport:actionset">
<to class="IsSupplementedBy" scheme="dnet:result_result_relations" type="publication">
doi_dedup___::015b27b0b7c55649236bf23a5c75f817
</to>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">10.6084/m9.figshare.15656924.v2
</pid>
<dateofacceptance>2021-01-01</dateofacceptance>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">Implementation Details of the Reconstruction
Pipeline and Electrophysiological Inference Results from A completely automated pipeline
for 3D reconstruction of human heart from 2D cine magnetic resonance slices
</title>
<publisher>The Royal Society</publisher>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254"/>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">10.6084/m9.figshare.15656924
</pid>
</rel>
<rel inferred="false" trust="0.9" inferenceprovenance="" provenanceaction="sysimport:actionset">
<to class="isProducedBy" scheme="dnet:result_project_relations" type="project">
corda__h2020::27f89b49dee12d828cc0f90f51727204
</to>
<code>823712</code>
<funding>
<funder id="ec__________::EC" shortname="EC" name="European Commission"
jurisdiction="EU"/>
<funding_level_0 name="H2020">ec__________::EC::H2020</funding_level_0>
<funding_level_1 name="RIA">ec__________::EC::H2020::RIA</funding_level_1>
</funding>
<acronym>CompBioMed2</acronym>
<title>A Centre of Excellence in Computational Biomedicine</title>
</rel>
<rel inferred="false" trust="0.9" inferenceprovenance="" provenanceaction="sysimport:actionset">
<to class="IsSupplementedBy" scheme="dnet:result_result_relations" type="dataset">
doi_dedup___::be1ef3b30a8d7aa7e4dfe1570d5febf7
</to>
<dateofacceptance>2021-01-01</dateofacceptance>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">10.6084/m9.figshare.15656927
</pid>
<publisher>The Royal Society</publisher>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">10.6084/m9.figshare.15656927.v1
</pid>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254"/>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">Montage Video of the Stepwise Performance of 3D
Reconstruction Pipeline on All 20 Patients from A completely automated pipeline for 3D
reconstruction of human heart from 2D cine magnetic resonance slices
</title>
</rel>
<rel inferred="false" trust="0.9" inferenceprovenance="" provenanceaction="sysimport:actionset">
<to class="IsSupplementedBy" scheme="dnet:result_result_relations" type="dataset">
doi_________::9f9f2328e11d379b14cb888209e33088
</to>
<dateofacceptance>2021-01-01</dateofacceptance>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">10.6084/m9.figshare.15656924.v1
</pid>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">Implementation Details of the Reconstruction
Pipeline and Electrophysiological Inference Results from A completely automated pipeline
for 3D reconstruction of human heart from 2D cine magnetic resonance slices
</title>
<publisher>The Royal Society</publisher>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254"/>
</rel>
</rels>
<children>
<result objidentifier="pmid________::9c855c9c4a8018df23f9d51f879b62b1">
<dateofacceptance>2021-10-01</dateofacceptance>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">34689630
</pid>
<collectedfrom name="PubMed Central" id="opendoar____::eda80a3d5b344bc40f3bc04f65b7a357"/>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">34689630
</pid>
<publisher>The Royal Society</publisher>
<collectedfrom name="Europe PubMed Central"
id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c"/>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">PMC8543046
</pid>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">A completely automated
pipeline for 3D reconstruction of human heart from 2D cine magnetic resonance slices
</title>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">PMC8543046
</pid>
</result>
<result objidentifier="od______1064::83eb0f76b60445d72bb7428a1b68ef1a">
<dateofacceptance>2023-01-05</dateofacceptance>
<publisher>Royal Society</publisher>
<collectedfrom name="Oxford University Research Archive"
id="opendoar____::2290a7385ed77cc5592dc2153229f082"/>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">A completely automated
pipeline for 3D reconstruction of human heart from 2D cine magnetic resonance slices
</title>
</result>
<result objidentifier="doi_________::e225555a08a082ad8f53f179bc59c5d0">
<dateofacceptance>2021-10-25</dateofacceptance>
<publisher>The Royal Society</publisher>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2"/>
<title classid="alternative title" classname="alternative title"
schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">A completely
automated pipeline for 3D reconstruction of human heart from 2D cine magnetic resonance
slices.
</title>
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993"/>
<collectedfrom name="ORCID" id="openaire____::806360c771262b4d6770e7cdf04b5c5a"/>
<collectedfrom name="Microsoft Academic Graph"
id="openaire____::5f532a3fc4f1ea403f37070f59a7a53a"/>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types">10.1098/rsta.2020.0257
</pid>
</result>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="Oxford University Research Archive"
id="opendoar____::2290a7385ed77cc5592dc2153229f082"/>
<hostedby name="Oxford University Research Archive"
id="opendoar____::2290a7385ed77cc5592dc2153229f082"/>
<dateofacceptance>2023-01-05</dateofacceptance>
<instancetype classid="0038" classname="Other literature type"
schemeid="dnet:publication_resource" schemename="dnet:publication_resource"/>
<alternateidentifier classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">34689630
</alternateidentifier>
<alternateidentifier classid="doi" classname="Digital Object Identifier"
schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">
10.1098/rsta.2020.0257
</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<license>http://creativecommons.org/licenses/by/4.0/</license>
<webresource>
<url>https://ora.ox.ac.uk/objects/uuid:9fc4563a-07e1-41d1-8b99-31ce2f8ac027</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993"/>
<hostedby
name="Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences"
id="issn___print::c72e72d9860ed6d9ec013e20c9421ba0"/>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types">10.1098/rsta.2020.0257
</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<webresource>
<url>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8543046</url>
</webresource>
</instance>
<instance>
<accessright classid="CLOSED" classname="Closed Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2"/>
<hostedby
name="Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences"
id="issn___print::c72e72d9860ed6d9ec013e20c9421ba0"/>
<dateofacceptance>2021-10-25</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types">10.1098/rsta.2020.0257
</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<license>https://royalsociety.org/journals/ethics-policies/data-sharing-mining/</license>
<webresource>
<url>https://doi.org/10.1098/rsta.2020.0257</url>
</webresource>
</instance>
<instance>
<accessright classid="UNKNOWN" classname="not available" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="Europe PubMed Central"
id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c"/>
<hostedby name="Unknown Repository" id="openaire____::55045bd2a65019fd8e6741a755395c8c"/>
<dateofacceptance>2021-10-25</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">34689630
</pid>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset"
trust="0.9">PMC8543046
</pid>
<alternateidentifier classid="doi" classname="Digital Object Identifier"
schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:actionset" trust="0.9">
10.1098/rsta.2020.0257
</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<webresource>
<url>https://pubmed.ncbi.nlm.nih.gov/34689630</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="PubMed Central" id="opendoar____::eda80a3d5b344bc40f3bc04f65b7a357"/>
<hostedby name="Europe PubMed Central" id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c"/>
<dateofacceptance>2021-10-01</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">34689630
</pid>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">PMC8543046
</pid>
<alternateidentifier classid="doi" classname="Digital Object Identifier"
schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:repository" trust="0.9">
10.1098/rsta.2020.0257
</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<webresource>
<url>http://europepmc.org/articles/PMC8543046</url>
</webresource>
</instance>
</children>
</oaf:result>
</oaf:entity>
</metadata>
</result>
</record>

View File

@ -0,0 +1,238 @@
<record>
<result xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<dri:objIdentifier>doi_dedup___::88a9861b26cdda1c3dd162d11f0bedbe</dri:objIdentifier>
<dri:dateOfCollection>2023-03-13T00:12:27+0000</dri:dateOfCollection>
<dri:dateOfTransformation>2023-03-13T04:39:52.807Z</dri:dateOfTransformation>
</header>
<metadata>
<oaf:entity xmlns:oaf="http://namespace.openaire.eu/oaf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://namespace.openaire.eu/oaf http://namespace.openaire.eu/oaf http://www.openaire.eu/schema/0.2/oaf-0.2.xsd">
<oaf:result>
<collectedfrom name="NARCIS" id="openaire____::fdb035c8b3e0540a8d9a561a6c44f4de" />
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2" />
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993" />
<collectedfrom name="ORCID" id="openaire____::806360c771262b4d6770e7cdf04b5c5a" />
<collectedfrom name="Microsoft Academic Graph" id="openaire____::5f532a3fc4f1ea403f37070f59a7a53a" />
<collectedfrom name="Europe PubMed Central" id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c" />
<collectedfrom name="NARCIS" id="eurocrisdris::fe4903425d9040f680d8610d9079ea14" />
<originalId>oai:pure.eur.nl:publications/70bf9bd0-5ea6-45fd-bb62-8d79b48cd69f</originalId>
<originalId>50|narcis______::3df5b8b060b819af0d439dd6751c8a77</originalId>
<originalId>10.2196/33081</originalId>
<originalId>50|doiboost____::88a9861b26cdda1c3dd162d11f0bedbe</originalId>
<originalId>3212148341</originalId>
<originalId>od_______267::276eb3ebee07cf1f3e8bfc43926fd0c2</originalId>
<originalId>35099399</originalId>
<originalId>PMC8844982</originalId>
<originalId>oai:services.nod.dans.knaw.nl:Publications/eur:oai:pure.eur.nl:publications/70bf9bd0-5ea6-45fd-bb62-8d79b48cd69f</originalId>
<originalId>50|dris___00893::107e97e645cbb06fb7b454ce2569d6c2</originalId>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types">10.2196/33081</pid>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">35099399</pid>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">PMC8844982</pid>
<measure id="influence" score="4.9560573E-9" class="C" />
<measure id="popularity" score="1.6705286E-8" class="C" />
<measure id="influence_alt" score="6" class="C" />
<measure id="popularity_alt" score="2.16" class="C" />
<measure id="impulse" score="6" class="C" />
<title classid="alternative title" classname="alternative title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Mapping the Ethical Issues of Digital Twins for Personalised Healthcare Service (Preprint)</title>
<title classid="subtitle" classname="subtitle" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title" inferred="false" provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">Preliminary Mapping Study</title>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title" inferred="false" provenanceaction="sysimport:crosswalk:repository" trust="0.9">Ethical Issues of Digital Twins for Personalized Health Care Service: Preliminary Mapping Study</title>
<bestaccessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<creator rank="2" URL="https://academic.microsoft.com/#/detail/3211925536" orcid="0000-0001-9882-0389">Ki-hun Kim</creator>
<creator rank="1" URL="https://academic.microsoft.com/#/detail/3214669734" orcid="0000-0001-5334-8615">Pei-hua Huang</creator>
<creator rank="3" URL="https://academic.microsoft.com/#/detail/3214611946" orcid="0000-0003-4283-9659">Maartje Schermer</creator>
<contributor>Public Health</contributor>
<country classid="NL" classname="Netherlands" schemeid="dnet:countries" schemename="dnet:countries" />
<dateofacceptance>2022-01-01</dateofacceptance>
<description>Background: The concept of digital twins has great potential for transforming the existing health care system by making it more personalized. As a convergence of health care, artificial intelligence, and information and communication technologies, personalized health care services that are developed under the concept of digital twins raise a myriad of ethical issues. Although some of the ethical issues are known to researchers working on digital health and personalized medicine, currently, there is no comprehensive review that maps the major ethical risks of digital twins for personalized health care services. Objective This study aims to fill the research gap by identifying the major ethical risks of digital twins for personalized health care services. We first propose a working definition for digital twins for personalized health care services to facilitate future discussions on the ethical issues related to these emerging digital health services. We then develop a process-oriented ethical map to identify the major ethical risks in each of the different data processing phases. MethodsWe resorted to the literature on eHealth, personalized medicine, precision medicine, and information engineering to identify potential issues and developed a process-oriented ethical map to structure the inquiry in a more systematic way. The ethical map allows us to see how each of the major ethical concerns emerges during the process of transforming raw data into valuable information. Developers of a digital twin for personalized health care service may use this map to identify ethical risks during the development stage in a more systematic way and can proactively address them. ResultsThis paper provides a working definition of digital twins for personalized health care services by identifying 3 features that distinguish the new application from other eHealth services. On the basis of the working definition, this paper further layouts 10 major operational problems and the corresponding ethical risks. ConclusionsIt is challenging to address all the major ethical risks that a digital twin for a personalized health care service might encounter proactively without a conceptual map at hand. The process-oriented ethical map we propose here can assist the developers of digital twins for personalized health care services in analyzing ethical risks in a more systematic manner. </description>
<subject classid="mesh" classname="Medical Subject Headings" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="true" inferenceprovenance="iis::document_classes" provenanceaction="iis" trust="0.891">education</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Health Informatics</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Ethical issues</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Healthcare service</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Psychology</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Internet privacy</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.4395309">business.industry</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.4395309">business</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Preprint</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Artificial Intelligence</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Delivery of Health Care</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Health Services</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Humans</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Precision Medicine</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Telemedicine</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03 medical and health sciences</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0302 clinical medicine
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030212 General &amp; Internal Medicine
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03021201 Health care/Health care quality - datum/health care
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03021201 Health care/Health care quality - datum/electronic health
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0301 basic medicine
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030104 Developmental Biology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0303 health sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030304 Developmental Biology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030304 Developmental Biology - datum/datum management/ethical
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">06 humanities and the arts</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0603 philosophy, ethics and religion
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">060301 Applied Ethics
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">06030101 Bioethics/Coordinates on Wikidata - intelligence/artificial intelligence/ethical
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">06030101 Bioethics/Coordinates on Wikidata - datum/ethical
</subject>
<language classid="eng" classname="English" schemeid="dnet:languages" schemename="dnet:languages" />
<relevantdate classid="created" classname="created" schemeid="dnet:dataCite_date" schemename="dnet:dataCite_date">2021-11-17</relevantdate>
<relevantdate classid="published-online" classname="published-online" schemeid="dnet:dataCite_date" schemename="dnet:dataCite_date">2022-01-31</relevantdate>
<relevantdate classid="issued" classname="issued" schemeid="dnet:dataCite_date" schemename="dnet:dataCite_date" inferred="false" provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">2022-01-01</relevantdate>
<source>Crossref</source>
<source />
<source>Journal of Medical Internet Research, 24(1):e33081. Journal of medical Internet Research</source>
<source>urn:issn:1439-4456</source>
<source>VOLUME=24;ISSUE=1;ISSN=1439-4456;TITLE=Journal of Medical Internet Research</source>
<format>application/pdf</format>
<resulttype classid="publication" classname="publication" schemeid="dnet:result_typologies" schemename="dnet:result_typologies" />
<resourcetype classid="0001" classname="0001" schemeid="dnet:dataCite_resource" schemename="dnet:dataCite_resource" />
<context id="knowmad" label="Knowmad Institut" type="community" />
<context id="dth" label="Digital Twins in Health" type="community"/>
<datainfo>
<inferred>true</inferred>
<deletedbyinference>false</deletedbyinference>
<trust>0.8</trust>
<inferenceprovenance>dedup-result-decisiontree-v3</inferenceprovenance>
<provenanceaction classid="sysimport:dedup" classname="Inferred by OpenAIRE" schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions" />
</datainfo>
<rels />
<children>
<result objidentifier="narcis______::3df5b8b060b819af0d439dd6751c8a77">
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title" inferred="false" provenanceaction="sysimport:crosswalk:repository" trust="0.9">Ethical Issues of Digital Twins for Personalized Health Care Service: Preliminary Mapping Study</title>
<collectedfrom name="NARCIS" id="openaire____::fdb035c8b3e0540a8d9a561a6c44f4de" />
<dateofacceptance>2022-01-01</dateofacceptance>
</result>
<result objidentifier="pmid________::70f4213809e44f787bf9f6ffa50ff98f">
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">PMC8844982</pid>
<collectedfrom name="Europe PubMed Central" id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c" />
<dateofacceptance>2021-08-23</dateofacceptance>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">Ethical Issues of Digital Twins for Personalized Health Care Service: Preliminary Mapping Study.</title>
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">35099399</pid>
</result>
<result objidentifier="dris___00893::107e97e645cbb06fb7b454ce2569d6c2">
<title classid="subtitle" classname="subtitle" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title" inferred="false" provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">Preliminary Mapping Study</title>
<dateofacceptance>2022-01-01</dateofacceptance>
<collectedfrom name="NARCIS" id="eurocrisdris::fe4903425d9040f680d8610d9079ea14" />
</result>
<result objidentifier="doi_________::88a9861b26cdda1c3dd162d11f0bedbe">
<dateofacceptance>2022-01-31</dateofacceptance>
<title classid="alternative title" classname="alternative title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Mapping the Ethical Issues of Digital Twins for Personalised Healthcare Service (Preprint)</title>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2" />
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993" />
<collectedfrom name="ORCID" id="openaire____::806360c771262b4d6770e7cdf04b5c5a" />
<collectedfrom name="Microsoft Academic Graph" id="openaire____::5f532a3fc4f1ea403f37070f59a7a53a" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types">10.2196/33081</pid>
<publisher>JMIR Publications Inc.</publisher>
</result>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Europe PubMed Central" id="opendoar____::8b6dd7db9af49e67306feb59a8bdc52c" />
<hostedby name="Journal of Medical Internet Research" id="doajarticles::972bc6f95b43b5ad42f75695b1d8e948" />
<dateofacceptance>2021-08-23</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="pmid" classname="PubMed ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">35099399</pid>
<pid classid="pmc" classname="PubMed Central ID" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">PMC8844982</pid>
<alternateidentifier classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.2196/33081</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<webresource>
<url>https://pubmed.ncbi.nlm.nih.gov/35099399</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2" />
<collectedfrom name="NARCIS" id="eurocrisdris::fe4903425d9040f680d8610d9079ea14" />
<hostedby name="Journal of Medical Internet Research" id="doajarticles::972bc6f95b43b5ad42f75695b1d8e948" />
<dateofacceptance>2022-01-01</dateofacceptance>
<dateofacceptance>2022-01-31</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types">10.2196/33081</pid>
<alternateidentifier classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">10.2196/33081</alternateidentifier>
<alternateidentifier classid="urn" classname="urn" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">urn:nbn:nl:ui:15-70bf9bd0-5ea6-45fd-bb62-8d79b48cd69f</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<webresource>
<url>https://doi.org/10.2196/33081</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="NARCIS" id="openaire____::fdb035c8b3e0540a8d9a561a6c44f4de" />
<hostedby name="NARCIS" id="openaire____::fdb035c8b3e0540a8d9a561a6c44f4de" />
<dateofacceptance>2022-01-01</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<alternateidentifier classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:crosswalk:repository" trust="0.9">10.2196/33081</alternateidentifier>
<alternateidentifier classid="urn" classname="urn" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:crosswalk:repository" trust="0.9">urn:nbn:nl:ui:15-70bf9bd0-5ea6-45fd-bb62-8d79b48cd69f</alternateidentifier>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<webresource>
<url>https://pure.eur.nl/en/publications/70bf9bd0-5ea6-45fd-bb62-8d79b48cd69f</url>
</webresource>
</instance>
</children>
</oaf:result>
</oaf:entity>
</metadata>
</result>
</record>

View File

@ -0,0 +1,223 @@
<record>
<result xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<dri:objIdentifier>doi_________::c166d06aeaed817937a79a400906a4b9</dri:objIdentifier>
<dri:dateOfCollection>2023-03-09T00:12:02.045Z</dri:dateOfCollection>
<dri:dateOfTransformation>2023-03-09T00:24:00.8Z</dri:dateOfTransformation>
</header>
<metadata>
<oaf:entity xmlns:oaf="http://namespace.openaire.eu/oaf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://namespace.openaire.eu/oaf http://namespace.openaire.eu/oaf http://www.openaire.eu/schema/0.2/oaf-0.2.xsd">
<oaf:result>
<collectedfrom name="OpenAPC Global Initiative"
id="apc_________::e2b1600b229fc30663c8a1f662debddf"/>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2"/>
<collectedfrom name="UnpayWall" id="openaire____::8ac8380272269217cb09a928c8caa993"/>
<collectedfrom name="ORCID" id="openaire____::806360c771262b4d6770e7cdf04b5c5a"/>
<collectedfrom name="Microsoft Academic Graph" id="openaire____::5f532a3fc4f1ea403f37070f59a7a53a"/>
<originalId>50|openapc_____::c166d06aeaed817937a79a400906a4b9</originalId>
<originalId>10.3390/pr9111967</originalId>
<originalId>pr9111967</originalId>
<originalId>50|doiboost____::c166d06aeaed817937a79a400906a4b9</originalId>
<originalId>3209532762</originalId>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types" inferred="false"
provenanceaction="sysimport:crosswalk:datasetarchive" trust="0.9">10.3390/pr9111967
</pid>
<measure id="influence" score="5.187591E-9" class="C"/>
<measure id="popularity" score="2.4479982E-8" class="C"/>
<measure id="influence_alt" score="10" class="C"/>
<measure id="popularity_alt" score="3.6" class="C"/>
<measure id="impulse" score="10" class="C"/>
<title classid="alternative title" classname="alternative title" schemeid="dnet:dataCite_title"
schemename="dnet:dataCite_title">Digital Twins for Continuous mRNA Production
</title>
<bestaccessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<dateofacceptance>2021-11-04</dateofacceptance>
<description>The global coronavirus pandemic continues to restrict public life worldwide. An
effective means of limiting the pandemic is vaccination. Messenger ribonucleic acid (mRNA)
vaccines currently available on the market have proven to be a well-tolerated and effective
class of vaccine against coronavirus type 2 (CoV2). Accordingly, demand is presently
outstripping mRNA vaccine production. One way to increase productivity is to switch from the
currently performed batch to continuous in vitro transcription, which has proven to be a crucial
material-consuming step. In this article, a physico-chemical model of in vitro mRNA
transcription in a tubular reactor is presented and compared to classical batch and continuous
in vitro transcription in a stirred tank. The three models are validated based on a distinct and
quantitative validation workflow. Statistically significant parameters are identified as part of
the parameter determination concept. Monte Carlo simulations showed that the model is precise,
with a deviation of less than 1%. The advantages of continuous production are pointed out
compared to batchwise in vitro transcription by optimization of the spacetime yield.
Improvements of a factor of 56 (0.011 µM/min) in the case of the continuously stirred tank
reactor (CSTR) and 68 (0.013 µM/min) in the case of the plug flow reactor (PFR) were found.
</description>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Process Chemistry and Technology
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Chemical Engineering (miscellaneous)
</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Bioengineering
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Coronavirus
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.4019353">medicine.disease_cause
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="false"
provenanceaction="sysimport:actionset" trust="0.4019353">medicine
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Continuous production
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Messenger RNA
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Continuous stirred-tank reactor
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Plug flow reactor model
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Mathematics
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">In vitro transcription
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Biological system
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Yield (chemistry)
</subject>
<subject classid="MAG" classname="Microsoft Academic Graph classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies">Public life
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">03 medical and health sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0301 basic medicine
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030104 Developmental Biology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0303 health sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">030304 Developmental Biology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">02 engineering and technology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0210 nano-technology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">021001 Nanoscience &amp;
Nanotechnology
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">01 natural sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">0104 chemical sciences
</subject>
<subject classid="FOS" classname="Fields of Science and Technology classification"
schemeid="dnet:subject_classification_typologies"
schemename="dnet:subject_classification_typologies" inferred="true"
inferenceprovenance="update" provenanceaction="subject:fos">010405 Organic Chemistry
</subject>
<language classid="UNKNOWN" classname="UNKNOWN" schemeid="dnet:languages"
schemename="dnet:languages"/>
<relevantdate classid="created" classname="created" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date">2021-11-05
</relevantdate>
<relevantdate classid="published-online" classname="published-online" schemeid="dnet:dataCite_date"
schemename="dnet:dataCite_date">2021-11-04
</relevantdate>
<source>Crossref</source>
<source/>
<resulttype classid="publication" classname="publication" schemeid="dnet:result_typologies"
schemename="dnet:result_typologies"/>
<resourcetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<processingchargeamount>2146.08</processingchargeamount>
<processingchargecurrency>EUR</processingchargecurrency>
<journal issn="2227-9717">Processes</journal>
<context id="covid-19" label="COVID-19" type="community"/>
<context id="dth" label="Digital Twins in Health" type="community"/>
<datainfo>
<inferred>false</inferred>
<deletedbyinference>false</deletedbyinference>
<trust>0.9</trust>
<inferenceprovenance>null</inferenceprovenance>
<provenanceaction classid="sysimport:actionset" classname="Harvested"
schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions"/>
</datainfo>
<rels/>
<children>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes"
schemename="dnet:access_modes"/>
<collectedfrom name="Crossref" id="openaire____::081b82f96300b6a6e3d282bad31cb6e2"/>
<hostedby name="Processes" id="doajarticles::9c4841602dcae937ac9c86cfb03c5ee1"/>
<dateofacceptance>2021-11-04</dateofacceptance>
<instancetype classid="0001" classname="Article" schemeid="dnet:publication_resource"
schemename="dnet:publication_resource"/>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types"
schemename="dnet:pid_types">10.3390/pr9111967
</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels"
schemename="dnet:review_levels"/>
<license>https://creativecommons.org/licenses/by/4.0/</license>
<webresource>
<url>https://doi.org/10.3390/pr9111967</url>
</webresource>
</instance>
</children>
</oaf:result>
</oaf:entity>
</metadata>
</result>
</record>

View File

@ -0,0 +1,138 @@
<record>
<result xmlns:dri="http://www.driver-repository.eu/namespace/dri"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<dri:objIdentifier>doi_dedup___::10a910f4a66b7f4bce8407d7a486a80a</dri:objIdentifier>
<dri:dateOfCollection>2023-04-05T00:36:27+0000</dri:dateOfCollection>
<dri:dateOfTransformation>2023-04-05T07:33:52.185Z</dri:dateOfTransformation>
</header>
<metadata>
<oaf:entity xmlns:oaf="http://namespace.openaire.eu/oaf"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://namespace.openaire.eu/oaf http://namespace.openaire.eu/oaf http://www.openaire.eu/schema/0.2/oaf-0.2.xsd">
<oaf:result>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<originalId>50|datacite____::10a910f4a66b7f4bce8407d7a486a80a</originalId>
<originalId>10.5281/zenodo.6967373</originalId>
<originalId>50|datacite____::172969c66c312a9656fc745f0ec62ce5</originalId>
<originalId>10.5281/zenodo.6969999</originalId>
<originalId>50|datacite____::4fa8f1c89ff11e8e99f9ded870ade80d</originalId>
<originalId>10.5281/zenodo.6967372</originalId>
<originalId>50|datacite____::a466b6173773d742b7a5881682748a8c</originalId>
<originalId>10.5281/zenodo.6970067</originalId>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967373</pid>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6969999</pid>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967372</pid>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6970067</pid>
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Sentinel-3 NDVI ARD and Long Term Statistics (1999-2019) from the Copernicus Global Land Service over Lombardia</title>
<bestaccessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<creator rank="1">Marasco Pier Lorenzo</creator>
<dateofacceptance>2022-08-05</dateofacceptance>
<description>Sentinel-3 NDVI Analysis Ready Data (ARD) (C_GLS_NDVI_20220101_20220701_Lombardia_S3_2.nc) product provided by the Copernicus Global Land Service [3]. The file C_GLS_NDVI_20220101_20220701_Lombardia_S3_2_masked.nc is derived from C_GLS_NDVI_20220101_20220701_Lombardia_S3_2.nc but values have been scaled (raw_value * ( 1/250) - 0.08) and values lower then -0.08 and greater than 0.92 have been removed (set to missing values). The original dataset can also be discovered through the OpenEO API[5] from the CGLS distributor VITO [4]. Access is free of charge but an EGI registration is needed. The file called Italy.geojson has been created using the Global Administrative Unit Layers GAUL G2015_2014 provided by FAO-UN (see Documentation). It only contains information related to Italy. Further info about drought indexes can be found in the Integrated Drought Management Programme [5] [1] Application of vegetation index and brightness temperature for drought detection [2] NDVI [3] Copernicus Global Land Service [4] Vito [5] OpenEO [5] Integrated Drought Management</description>
<description>These datasets are used for training purposes. See https://pangeo-data.github.io/foss4g-2022/intro.html</description>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">NDVI</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">vegetaion</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">Copernicus Global Land Service</subject>
<subject classid="keyword" classname="keyword" schemeid="dnet:subject_classification_typologies" schemename="dnet:subject_classification_typologies">pangeo</subject>
<language classid="eng" classname="English" schemeid="dnet:languages" schemename="dnet:languages" />
<relevantdate classid="issued" classname="issued" schemeid="dnet:dataCite_date" schemename="dnet:dataCite_date">2022-08-05</relevantdate>
<publisher>Zenodo</publisher>
<resulttype classid="dataset" classname="dataset" schemeid="dnet:result_typologies" schemename="dnet:result_typologies" />
<resourcetype classid="UNKNOWN" classname="Unknown" schemeid="dnet:dataCite_resource" schemename="dnet:dataCite_resource" />
<eoscifguidelines code="EOSC::Jupyter Notebook" label="EOSC::Jupyter Notebook" semanticrelation="compliesWith" />
<datainfo>
<inferred>true</inferred>
<deletedbyinference>false</deletedbyinference>
<trust>0.8</trust>
<inferenceprovenance>dedup-result-decisiontree-v3</inferenceprovenance>
<provenanceaction classid="sysimport:dedup" classname="Inferred by OpenAIRE" schemeid="dnet:provenanceActions" schemename="dnet:provenanceActions" />
</datainfo>
<rels></rels>
<children>
<result objidentifier="doi_________::4fa8f1c89ff11e8e99f9ded870ade80d">
<publisher>Zenodo</publisher>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967372</pid>
<dateofacceptance>2022-08-05</dateofacceptance>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Sentinel-3 NDVI ARD and Long Term Statistics (1999-2019) from the Copernicus Global Land Service over Lombardia</title>
</result>
<result objidentifier="doi_________::a466b6173773d742b7a5881682748a8c">
<publisher>Zenodo</publisher>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6970067</pid>
<dateofacceptance>2022-08-05</dateofacceptance>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Sentinel-3 NDVI ARD and Long Term Statistics (1999-2019) from the Copernicus Global Land Service over Lombardia</title>
</result>
<result objidentifier="doi_________::172969c66c312a9656fc745f0ec62ce5">
<publisher>Zenodo</publisher>
<dateofacceptance>2022-08-05</dateofacceptance>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6969999</pid>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Sentinel-3 NDVI ARD and Long Term Statistics (1999-2019) from the Copernicus Global Land Service over Lombardia</title>
</result>
<result objidentifier="doi_________::10a910f4a66b7f4bce8407d7a486a80a">
<publisher>Zenodo</publisher>
<dateofacceptance>2022-08-05</dateofacceptance>
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<title classid="main title" classname="main title" schemeid="dnet:dataCite_title" schemename="dnet:dataCite_title">Sentinel-3 NDVI ARD and Long Term Statistics (1999-2019) from the Copernicus Global Land Service over Lombardia</title>
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967373</pid>
</result>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<hostedby name="ZENODO" id="opendoar____::358aee4cc897452c00244351e4d91f69" />
<dateofacceptance>2022-08-05</dateofacceptance>
<instancetype classid="0021" classname="Dataset" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967373</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<license>https://creativecommons.org/licenses/by/4.0/legalcode</license>
<webresource>
<url>https://doi.org/10.5281/zenodo.6967373</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<hostedby name="ZENODO" id="opendoar____::358aee4cc897452c00244351e4d91f69" />
<dateofacceptance>2022-08-05</dateofacceptance>
<instancetype classid="0021" classname="Dataset" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6970067</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<license>https://creativecommons.org/licenses/by/4.0/legalcode</license>
<webresource>
<url>https://doi.org/10.5281/zenodo.6970067</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<hostedby name="ZENODO" id="opendoar____::358aee4cc897452c00244351e4d91f69" />
<dateofacceptance>2022-08-05</dateofacceptance>
<instancetype classid="0021" classname="Dataset" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6969999</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<license>https://creativecommons.org/licenses/by/4.0/legalcode</license>
<webresource>
<url>https://doi.org/10.5281/zenodo.6969999</url>
</webresource>
</instance>
<instance>
<accessright classid="OPEN" classname="Open Access" schemeid="dnet:access_modes" schemename="dnet:access_modes" />
<collectedfrom name="Datacite" id="openaire____::9e3be59865b2c1c335d32dae2fe7b254" />
<hostedby name="ZENODO" id="opendoar____::358aee4cc897452c00244351e4d91f69" />
<dateofacceptance>2022-08-05</dateofacceptance>
<instancetype classid="0021" classname="Dataset" schemeid="dnet:publication_resource" schemename="dnet:publication_resource" />
<pid classid="doi" classname="Digital Object Identifier" schemeid="dnet:pid_types" schemename="dnet:pid_types" inferred="false" provenanceaction="sysimport:actionset" trust="0.9">10.5281/zenodo.6967372</pid>
<refereed classid="0000" classname="UNKNOWN" schemeid="dnet:review_levels" schemename="dnet:review_levels" />
<license>https://creativecommons.org/licenses/by/4.0/legalcode</license>
<webresource>
<url>https://doi.org/10.5281/zenodo.6967372</url>
</webresource>
</instance>
</children>
</oaf:result>
</oaf:entity>
</metadata>
</result>
</record>

View File

@ -1,10 +1,10 @@
{
"eoscifguidelines": [
{
"code" : "EOSC::Jupyter Notebook",
"label" : "EOSC::Jupyter Notebook",
"url" : "",
"semanticRelation" : "compliesWith"
"code": "EOSC::Jupyter Notebook",
"label": "EOSC::Jupyter Notebook",
"url": "",
"semanticRelation": "compliesWith"
}
],
"measures": [
@ -431,7 +431,26 @@
"value": "VIRTA"
}
],
"context": [],
"context": [
{
"id": "eosc",
"dataInfo": [
{
"invisible": false,
"inferred": false,
"deletedbyinference": false,
"trust": "0.9",
"inferenceprovenance": "",
"provenanceaction": {
"classid": "sysimport:crosswalk",
"classname": "sysimport:crosswalk",
"schemeid": "dnet:provenanceActions",
"schemename": "dnet:provenanceActions"
}
}
]
}
],
"contributor": [],
"country": [],
"coverage": [],

View File

@ -39,7 +39,8 @@
<switch>
<!-- The default will be set as the normal start, a.k.a. get-doi-synonyms -->
<!-- If any different condition is set, go to the corresponding start -->
<case to="non-iterative-rankings">${wf:conf('resume') eq "rankings-start"}</case>
<case to="spark-cc">${wf:conf('resume') eq "cc"}</case>
<case to="spark-ram">${wf:conf('resume') eq "ram"}</case>
<case to="spark-impulse">${wf:conf('resume') eq "impulse"}</case>
<case to="spark-pagerank">${wf:conf('resume') eq "pagerank"}</case>
<case to="spark-attrank">${wf:conf('resume') eq "attrank"}</case>
@ -89,18 +90,11 @@
<file>${nameNode}${wfAppPath}/create_openaire_ranking_graph.py#create_openaire_ranking_graph.py</file>
</spark>
<ok to="non-iterative-rankings" />
<ok to="spark-cc"/>
<error to="openaire-graph-error" />
</action>
<!-- Citation Count and RAM are calculated in parallel-->
<fork name="non-iterative-rankings">
<path start="spark-cc"/>
<!-- <path start="spark-impulse"/> -->
<path start="spark-ram"/>
</fork>
<!-- Run Citation Count calculation -->
<action name="spark-cc">
<spark xmlns="uri:oozie:spark-action:0.2">
@ -129,7 +123,7 @@
<file>${wfAppPath}/bip-ranker/CC.py#CC.py</file>
</spark>
<ok to="join-non-iterative-rankings" />
<ok to="spark-ram" />
<error to="cc-fail" />
</action>
@ -165,14 +159,11 @@
<file>${wfAppPath}/bip-ranker/TAR.py#TAR.py</file>
</spark>
<ok to="join-non-iterative-rankings" />
<ok to="spark-impulse" />
<error to="ram-fail" />
</action>
<!-- Join non-iterative methods -->
<join name="join-non-iterative-rankings" to="spark-impulse"/>
<action name="spark-impulse">
<spark xmlns="uri:oozie:spark-action:0.2">

View File

@ -1,111 +0,0 @@
General notes
====================
Oozie-installer is a utility allowing building, uploading and running oozie workflows. In practice, it creates a `*.tar.gz` package that contains resouces that define a workflow and some helper scripts.
This module is automatically executed when running:
`mvn package -Poozie-package -Dworkflow.source.dir=classpath/to/parent/directory/of/oozie_app`
on module having set:
<parent>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-workflows</artifactId>
</parent>
in `pom.xml` file. `oozie-package` profile initializes oozie workflow packaging, `workflow.source.dir` property points to a workflow (notice: this is not a relative path but a classpath to directory usually holding `oozie_app` subdirectory).
The outcome of this packaging is `oozie-package.tar.gz` file containing inside all the resources required to run Oozie workflow:
- jar packages
- workflow definitions
- job properties
- maintenance scripts
Required properties
====================
In order to include proper workflow within package, `workflow.source.dir` property has to be set. It could be provided by setting `-Dworkflow.source.dir=some/job/dir` maven parameter.
In oder to define full set of cluster environment properties one should create `~/.dhp/application.properties` file with the following properties:
- `dhp.hadoop.frontend.user.name` - your user name on hadoop cluster and frontend machine
- `dhp.hadoop.frontend.host.name` - frontend host name
- `dhp.hadoop.frontend.temp.dir` - frontend directory for temporary files
- `dhp.hadoop.frontend.port.ssh` - frontend machine ssh port
- `oozieServiceLoc` - oozie service location required by run_workflow.sh script executing oozie job
- `nameNode` - name node address
- `jobTracker` - job tracker address
- `oozie.execution.log.file.location` - location of file that will be created when executing oozie job, it contains output produced by `run_workflow.sh` script (needed to obtain oozie job id)
- `maven.executable` - mvn command location, requires parameterization due to a different setup of CI cluster
- `sparkDriverMemory` - amount of memory assigned to spark jobs driver
- `sparkExecutorMemory` - amount of memory assigned to spark jobs executors
- `sparkExecutorCores` - number of cores assigned to spark jobs executors
All values will be overriden with the ones from `job.properties` and eventually `job-override.properties` stored in module's main folder.
When overriding properties from `job.properties`, `job-override.properties` file can be created in main module directory (the one containing `pom.xml` file) and define all new properties which will override existing properties. One can provide those properties one by one as command line -D arguments.
Properties overriding order is the following:
1. `pom.xml` defined properties (located in the project root dir)
2. `~/.dhp/application.properties` defined properties
3. `${workflow.source.dir}/job.properties`
4. `job-override.properties` (located in the project root dir)
5. `maven -Dparam=value`
where the maven `-Dparam` property is overriding all the other ones.
Workflow definition requirements
====================
`workflow.source.dir` property should point to the following directory structure:
[${workflow.source.dir}]
|
|-job.properties (optional)
|
\-[oozie_app]
|
\-workflow.xml
This property can be set using maven `-D` switch.
`[oozie_app]` is the default directory name however it can be set to any value as soon as `oozieAppDir` property is provided with directory name as value.
Subworkflows are supported as well and subworkflow directories should be nested within `[oozie_app]` directory.
Creating oozie installer step-by-step
=====================================
Automated oozie-installer steps are the following:
1. creating jar packages: `*.jar` and `*tests.jar` along with copying all dependancies in `target/dependencies`
2. reading properties from maven, `~/.dhp/application.properties`, `job.properties`, `job-override.properties`
3. invoking priming mechanism linking resources from import.txt file (currently resolving subworkflow resources)
4. assembling shell scripts for preparing Hadoop filesystem, uploading Oozie application and starting workflow
5. copying whole `${workflow.source.dir}` content to `target/${oozie.package.file.name}`
6. generating updated `job.properties` file in `target/${oozie.package.file.name}` based on maven, `~/.dhp/application.properties`, `job.properties` and `job-override.properties`
7. creating `lib` directory (or multiple directories for subworkflows for each nested directory) and copying jar packages created at step (1) to each one of them
8. bundling whole `${oozie.package.file.name}` directory into single tar.gz package
Uploading oozie package and running workflow on cluster
=======================================================
In order to simplify deployment and execution process two dedicated profiles were introduced:
- `deploy`
- `run`
to be used along with `oozie-package` profile e.g. by providing `-Poozie-package,deploy,run` maven parameters.
`deploy` profile supplements packaging process with:
1) uploading oozie-package via scp to `/home/${user.name}/oozie-packages` directory on `${dhp.hadoop.frontend.host.name}` machine
2) extracting uploaded package
3) uploading oozie content to hadoop cluster HDFS location defined in `oozie.wf.application.path` property (generated dynamically by maven build process, based on `${dhp.hadoop.frontend.user.name}` and `workflow.source.dir` properties)
`run` profile introduces:
1) executing oozie application uploaded to HDFS cluster using `deploy` command. Triggers `run_workflow.sh` script providing runtime properties defined in `job.properties` file.
Notice: ssh access to frontend machine has to be configured on system level and it is preferable to set key-based authentication in order to simplify remote operations.

View File

@ -25,7 +25,6 @@
<modules>
<module>dhp-workflow-profiles</module>
<module>dhp-aggregation</module>
<module>dhp-distcp</module>
<module>dhp-actionmanager</module>
<module>dhp-graph-mapper</module>
<module>dhp-dedup-openaire</module>