Commit Graph

3190 Commits

Author SHA1 Message Date
Miriam Baglioni 2480e590d1 [DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis 2021-11-03 14:25:23 +01:00
Miriam Baglioni b9d124bb7c [Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation) 2021-11-03 13:55:37 +01:00
Sandro La Bruzzo 7bd224f051 implement first version of scholexplorer integration for the generation of final graph 2021-11-02 15:58:15 +01:00
Antonis Lempesis b97b78f874 removed hardcoded reference 2021-11-02 09:12:49 +01:00
Claudio Atzori 7fa49f6956 Merge pull request 'removed hardcoded reference' (#154) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #154
2021-11-02 09:11:30 +01:00
Antonis Lempesis f78afb5ef9 removed hardcoded reference 2021-11-01 15:42:29 +02:00
Miriam Baglioni 2aca6bfa0a mergin with branch beta 2021-10-29 11:20:45 +02:00
Miriam Baglioni 09f36cffb8 [Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation 2021-10-29 11:20:03 +02:00
Claudio Atzori 1225ba0b92 [resolution] increasing number of partitions to avoid OOM 2021-10-28 16:18:17 +02:00
Sandro La Bruzzo d9cbca83f7 moved filter on next phase 2021-10-28 16:13:24 +02:00
Claudio Atzori d02caef185 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-27 15:36:29 +02:00
Sandro La Bruzzo 1be9aa0a5f Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta 2021-10-26 17:52:20 +02:00
Sandro La Bruzzo 4acfa8fa2e Scholexplorer Datasource Aggregation:
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni d0ef7d91c5 adding test resource 2021-10-26 17:34:11 +02:00
Sandro La Bruzzo 034304b33a conflict resolved on merge 2021-10-26 09:40:47 +02:00
Michele Artini d66e20e7ac added hierarchy rel in ROR actionset 2021-10-21 15:51:48 +02:00
Claudio Atzori d147295c2f avoiding java.io.NotSerializableException: java.util.HashMap 2021-10-21 14:15:57 +02:00
Claudio Atzori 3702fe478d cleanup 2021-10-21 12:05:02 +02:00
Sandro La Bruzzo ac36aa7d1c fixed wrong Encoding during a map phase 2021-10-21 11:35:02 +02:00
Sandro La Bruzzo aeeebd573b code refactor renamed datacite package 2021-10-20 17:37:42 +02:00
Sandro La Bruzzo ab3a99d3e9 removed old datacite oozie workflow 2021-10-20 17:19:47 +02:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori cece432adc [stats] reducing the step22 wait time 2021-10-20 14:16:33 +02:00
Antonis Lempesis a7376907c2 invalidating medatadata before context thingies 2021-10-20 14:16:25 +02:00
Antonis Lempesis 43f4eb492b fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db 2021-10-20 14:16:11 +02:00
Claudio Atzori 4f8970f8ed [stats] reducing the step22 wait time 2021-10-20 14:14:53 +02:00
Claudio Atzori 00b78b9c58 cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies 2021-10-20 14:04:45 +02:00
Claudio Atzori c01dd0c925 registered oaf model classes for the KryoSerializer 2021-10-20 13:55:07 +02:00
Miriam Baglioni 652114c641 [affiliationPropagation] first try. preparetion 2021-10-20 11:44:23 +02:00
Claudio Atzori 59f76b50d4 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-20 09:42:35 +02:00
Antonis Lempesis 241dcf6df1 Merge branch 'beta' into beta 2021-10-19 23:54:21 +02:00
Claudio Atzori 515e068a78 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-19 16:46:06 +02:00
Claudio Atzori 512e7b0170 code formatting 2021-10-19 16:19:29 +02:00
Michele Artini c4fce785ab fixed a compilation problem of a unit test 2021-10-19 16:18:26 +02:00
Claudio Atzori e9157c67aa Merge branch 'beta' into dump 2021-10-19 16:15:03 +02:00
Claudio Atzori 98f37c8d81 WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 16:14:40 +02:00
Claudio Atzori c8850456e9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-19 16:09:54 +02:00
Claudio Atzori 172363e7f1 [broker] integrating PR#147, notification record creation phase separated from indexing on ES 2021-10-19 15:56:27 +02:00
Claudio Atzori bdffa86c2f undo last commit 2021-10-19 15:39:38 +02:00
Sandro La Bruzzo c9870c5122 code formatted 2021-10-19 15:24:59 +02:00
Sandro La Bruzzo f8329bc110 since dhp-schemas changed, introducing new Relation inverse model, this class has been updated 2021-10-19 15:24:22 +02:00
Claudio Atzori e471f12d5e hotfix: recovered implementation removing the hardcoded working_dirs 2021-10-19 12:35:38 +02:00
Claudio Atzori 7a73010acd WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 11:59:16 +02:00
Miriam Baglioni c7f6cd2591 added again the setting for saXReader 2021-10-19 10:15:26 +02:00
miconis 5f780a6ba1 bug fix in migrate entities: parameter name was wrong 2021-10-18 23:30:40 +02:00
Miriam Baglioni 1315952702 merge with branch beta 2021-10-18 14:17:09 +02:00
Miriam Baglioni 1cc09adfaa Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation 2021-10-18 14:11:27 +02:00
Miriam Baglioni 76d41602be Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-18 10:53:22 +02:00
Miriam Baglioni 46f82c7c8f removed not needed folder deletion 2021-10-18 10:53:16 +02:00
Sandro La Bruzzo 7b15b88d4c renamed wrong package, implemented last aggregation workflow for scholexplorer 2021-10-15 15:00:15 +02:00
Antonis Lempesis 41ecb1eb61 invalidating medatadata before context thingies 2021-10-15 13:42:55 +03:00
Antonis Lempesis 4b7c8dff2d fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db 2021-10-14 18:53:35 +03:00
Claudio Atzori e15a1969a5 applying fix on the DOIBoost construction process that somehow wasn't part of the merge done in 83c90c7180 2021-10-14 14:33:56 +02:00
Sandro La Bruzzo 51a03c0a50 refactor code for EBI from dhp-graph-mapper into dhp-aggregation 2021-10-14 14:23:13 +02:00
Claudio Atzori 14fbf92ad6 Merge branch 'beta' into beta_solr_config 2021-10-14 11:08:44 +02:00
Miriam Baglioni 4b1920f008 changed the working path parameter value as dependant from the dnet-workflow working dir parameter 2021-10-14 09:18:09 +02:00
Miriam Baglioni 8db39c86e2 added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset 2021-10-14 09:17:39 +02:00
Claudio Atzori b292e4a700 [stats wf] added extra logging in the context data retrieval phase 2021-10-13 17:31:53 +02:00
miconis 995c1eddaf minor change 2021-10-13 17:07:10 +02:00
Miriam Baglioni 5d9cc2452d changed the working path parameter value as dependant from the dnet-workflow working dir parameter 2021-10-13 15:33:50 +02:00
miconis 326bf63775 integration of parent child orgs relations 2021-10-13 12:24:48 +02:00
Miriam Baglioni 16b28494a9 added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset 2021-10-13 11:34:24 +02:00
Miriam Baglioni 63933808d4 added fix for mixing result types, added configuration default to funder subworkflow 2021-10-13 11:28:28 +02:00
Sandro La Bruzzo 7387416e90 added params skip update to direct transform in OAF, this should be set to true in production 2021-10-12 12:36:30 +02:00
Sandro La Bruzzo 511da98d0c - fixed bug on download pmc Article
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni fec40bdd95 merging with branch beta - resolved conflicts 2021-10-12 09:16:36 +02:00
Miriam Baglioni 83f51f1812 refactoring 2021-10-12 09:14:43 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Claudio Atzori 2f61054cd1 code formatting 2021-10-11 18:29:42 +02:00
Claudio Atzori 83c90c7180 manually merging PR#149 #149 2021-10-11 18:27:05 +02:00
Serafeim Chatzopoulos 201ce71cc1 Add resultsubject, relprojectname and resultacceptanceyear to __all field 2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos e468a7b96b Add tests to query Solr with different configurations 2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos de81007302 Add exploreTestConfig, a new Solr configuration folder 2021-10-08 16:54:56 +03:00
Sandro La Bruzzo 8f99d2af86 Make the node of doiBoost to point to the correct OpenAire Organization in relations 2021-10-08 08:35:12 +02:00
Alessia Bardi c48c43fa9e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-07 17:30:53 +02:00
Alessia Bardi 8d3b60f446 test for patching records for EOSC Future 2021-10-07 17:30:45 +02:00
miconis 611ca511db set configuration property in openorgs duplicates wf 2021-10-07 15:39:55 +02:00
miconis 9646b9fd98 implementation of the http call for the update of openorgs suggestions 2021-10-07 11:29:11 +02:00
Sandro La Bruzzo 2557bb41f5 Implemented new method for update baseline inside scala node 2021-10-06 16:41:08 +02:00
Sandro La Bruzzo b84e0cabeb Implemented new method for update baseline 2021-10-05 16:34:47 +02:00
Michele Artini d6e1f22408 max numbers of workers for indexing 2021-10-05 15:09:18 +02:00
Michele Artini 210d6c0e6d generateNotificationsJob and indexNotificationsJob 2021-10-05 13:57:46 +02:00
Michele Artini 69008e20c2 log and tests 2021-10-05 11:58:20 +02:00
Sandro La Bruzzo f258bbb927 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-10-05 10:21:50 +02:00
Sandro La Bruzzo 991b06bd0b removed generation of EBI links from old dump, now EBI link dump is created by another wf 2021-10-05 10:21:33 +02:00
Claudio Atzori cb7efe12ac Merge pull request 'beta' (#146) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #146
2021-10-05 10:09:37 +02:00
Michele Artini 8bbaa17335 reimplemented of conditions cache as a non static variable 2021-10-05 09:20:37 +02:00
Miriam Baglioni e653756e3d applied some suggestiond from Sonar Lint 2021-10-04 18:40:07 +02:00
Michele Artini 0a9ef34b56 test 2021-10-04 15:46:12 +02:00
Michele Artini 31a6ad1d79 optimization of verifySubsriptions() 2021-10-04 12:01:56 +02:00
dimitrispie 3f25d2efb2 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 2021-10-01 16:03:48 +03:00
dimitrispie 13687fd887 Sprint 3 indicators update 2021-10-01 16:02:02 +03:00
Miriam Baglioni 9814c3e700 mergin with branch beta 2021-10-01 13:00:03 +02:00
Miriam Baglioni c4ccd7b32c - 2021-10-01 12:59:47 +02:00
Miriam Baglioni c8321ad31a merge with branch beta 2021-10-01 12:59:08 +02:00
Claudio Atzori b01cd521b0 removed configuration specifying the limit to 8 for spark.dynamicAllocation.maxExecutors 2021-10-01 11:26:33 +02:00
Claudio Atzori ec94cc9b93 IndexNotificationsJob test: persist contents on HDFS instead of passing them to ES 2021-10-01 09:41:27 +02:00
Claudio Atzori 60a6a9a583 [graph2hive] added field 'measures' to the result view 2021-09-30 09:27:26 +02:00
Sandro La Bruzzo 66702b1973 Added node to update datacite 2021-09-28 08:59:06 +02:00
Sandro La Bruzzo 477cb10715 Merge remote-tracking branch 'origin/beta' into beta 2021-09-27 16:57:23 +02:00
Sandro La Bruzzo be79d74e3d Fixed DoiBoost generation to point to correct organization in affiliation relation 2021-09-27 16:57:04 +02:00
Claudio Atzori 474117c2e8 Merge branch 'beta' into dedup_whitelist 2021-09-27 16:41:25 +02:00
Miriam Baglioni 476a4708d6 mergin with branch beta 2021-09-27 16:02:32 +02:00
Miriam Baglioni 5ec69889db OpenCitations: creation of AS from OC 2021-09-27 16:02:06 +02:00
Claudio Atzori a53acfbc06 Merge pull request '[stats] updates in the mapping, indicators, wf' (#145) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #145
2021-09-27 15:59:54 +02:00
Alessia Bardi b924276e18 tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject 2021-09-24 17:11:56 +02:00
Antonis Lempesis a1e1cf32d7 fixed an impala error 2021-09-24 12:57:24 +03:00
Antonis Lempesis f358cabb2b fixed typo 2021-09-22 21:50:37 +03:00
Miriam Baglioni eedf7c3310 mergin with branch beta 2021-09-22 15:18:34 +02:00
Miriam Baglioni f2118d771a first steps in the implementation of the integration of opencitations 2021-09-22 15:18:05 +02:00
Claudio Atzori 7fa60e166e Merge branch 'beta' into dedup_whitelist 2021-09-22 11:31:18 +02:00
Antonis Lempesis 421d55265d created hive action for observatory queries 2021-09-21 03:07:58 +03:00
Enrico Ottonello 92a63f78fe multiple download attempts handling if a connection to orcid server fails 2021-09-20 18:25:00 +02:00
Enrico Ottonello 0c74f5667e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-09-20 18:12:31 +02:00
miconis 853333bdde implementation of the whitelist for similarity relations 2021-09-20 16:21:47 +02:00
Antonis Lempesis 8b681dcf1b attempt to make the observatory wf run in hive 2021-09-18 00:35:14 +03:00
Antonis Lempesis 2943287d10 fixed the definition of cc_licence, part II 2021-09-16 15:59:06 +03:00
Antonis Lempesis dd2329849f fixed the definition of cc_licence 2021-09-16 13:50:34 +03:00
Claudio Atzori 09c2eb7f62 Merge branch 'beta' into clean_relations 2021-09-16 11:09:47 +02:00
Miriam Baglioni e9ccdf853f related to #132 2021-09-15 18:44:54 +02:00
Claudio Atzori 12766bf5f2 Merge branch 'beta' into clean_relations 2021-09-15 17:18:15 +02:00
Claudio Atzori 663b1556d7 manually integrating PR#140 #140 2021-09-15 16:40:25 +02:00
Claudio Atzori ebf53a1616 added cleaning for relation fields: subRelType & relClass according to dedicated vocabs 2021-09-15 16:10:37 +02:00
Enrico Ottonello 8b804e7fe1 removed unused imports 2021-09-14 17:30:52 +02:00
Enrico Ottonello aefa36c54b other task executions go ahead if UnknownHostException happens on a single task 2021-09-14 17:26:15 +02:00
Antonis Lempesis de9bf3a161 added cc_licences and abstracts in observatory db 2021-09-14 01:29:08 +03:00
Antonis Lempesis 9b1936701c fixed yet another typo 2021-09-13 21:07:44 +03:00
Antonis Lempesis 8fc89ae822 moved context table creation before indicators 2021-09-13 14:33:23 +03:00
Antonis Lempesis 461bf90ca6 fixed the gold_oa definition 2021-09-13 11:10:30 +03:00
Antonis Lempesis 43852bac0e creating other::other concept for all contexts 2021-09-13 01:36:41 +03:00
Antonis Lempesis f13cca7e83 moved dependencies of indicators before them... 2021-09-08 23:07:58 +03:00
Antonis Lempesis c6ada217a1 fixed typo 2021-09-08 22:34:59 +03:00
Antonis Lempesis 1250ae197f using new indicators for the definition of peerreviewed, gold, and green 2021-09-08 14:08:43 +03:00
Antonis Lempesis ccee451dde added indicators of sprint 2 in monitor db 2021-09-07 23:17:13 +03:00
Sandro La Bruzzo aed29156c7 changed behavior in transformation job, that doesn't fail at first error 2021-09-07 19:05:46 +02:00
Sandro La Bruzzo 370dddb2fa fix bug on oai iterator that skip record cleaned 2021-09-07 11:20:41 +02:00
Sandro La Bruzzo 3c6fc2096c fix bug on oai iterator that skip record cleaned 2021-09-07 10:46:26 +02:00
Sandro La Bruzzo d4dadf6d77 reduced max number of PID in Relatedentity 2021-09-02 14:21:24 +02:00
Sandro La Bruzzo 9f8a80deb7 fixed wrong import of unresolved relation in openaire 2021-09-01 14:16:27 +02:00
Alessia Bardi 3762b17f7b added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo e8b3cb9147 Implemented method to download delta updates in EBI Links 2021-08-30 09:32:45 +02:00
Alessia Bardi ccf4103a25 keep the original url if the decoder fails for any reason 2021-08-25 10:07:58 +02:00
Sandro La Bruzzo 45898c71ac fixed wrong doi in pubmed 2021-08-24 15:20:04 +02:00
Alessia Bardi 00a28c0080 originalId was renamed to acronym 2021-08-23 15:02:21 +02:00
Alessia Bardi f19b04d41b code formatting after mvn compile 2021-08-23 14:33:39 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Alessia Bardi 4c1474e693 Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data 2021-08-20 17:03:30 +02:00
Miriam Baglioni 5f8ccbc365 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-08-20 11:13:47 +02:00
Miriam Baglioni 882abb40e4 CrossrefDump - 2021-08-20 11:12:53 +02:00
Miriam Baglioni 45c62609af CrossrefDump - modified because parameter file was moved 2021-08-20 11:12:31 +02:00
Miriam Baglioni 35880c0e7b CrossrefDump - changed the wf to be able to resume from one of the steps 2021-08-20 11:11:35 +02:00
Miriam Baglioni f3b6c392c1 CrossrefDump - moving parameter file under folder crossref_dump_reader 2021-08-20 11:10:58 +02:00
Miriam Baglioni 65822400ce CrossrefDump - added new parameter file that was missing 2021-08-20 11:10:35 +02:00
Alessia Bardi a053e1513c different funders in blacklist from BETA and PROD aggregator 2021-08-19 11:32:27 +02:00
Alessia Bardi 812bd54c57 different funders in blacklist from BETA and PROD aggregator 2021-08-19 11:30:14 +02:00
Miriam Baglioni a65d3caaea Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-08-19 10:29:10 +02:00
Miriam Baglioni e5cf11d088 change open access route to result matching hbm to gold 2021-08-19 10:29:04 +02:00
Claudio Atzori 7c0c67bdd6 added mock pom 2021-08-13 17:45:53 +02:00
Claudio Atzori 82086f3422 fixed directory name 2021-08-13 17:42:14 +02:00
Claudio Atzori bc7068106c added crossref download oozie workflow 2021-08-13 17:19:44 +02:00
Claudio Atzori 2c0a05f11a manually merged PR#139 2021-08-13 17:15:53 +02:00
Claudio Atzori d43667d857 Merge pull request 'Automatic download of Crossref' (#138) from crossref_dw_wf into beta
Reviewed-on: #138
2021-08-13 17:10:10 +02:00
Miriam Baglioni 5856ca8a7b merging with branch beta - resolved conflicts 2021-08-13 16:45:45 +02:00
Miriam Baglioni 6fec71e8d2 removed the specific of the infra we are running the wf from the wf name 2021-08-13 16:39:02 +02:00
Miriam Baglioni ed7e28490a change in sh 2021-08-13 16:19:01 +02:00
Claudio Atzori 7743d0f919 consolidated dnet wf profiles into the same submodule 2021-08-13 16:14:54 +02:00
Miriam Baglioni 6eb7508995 mergin with branch beta 2021-08-13 16:07:04 +02:00
Claudio Atzori f74adc4752 added DownloadCSV2 as alternative implementation of the same download procedure 2021-08-13 15:52:15 +02:00
Claudio Atzori 5f0903d50d fixed CSV downloader & tests 2021-08-13 14:17:54 +02:00
Claudio Atzori 17cefe6a97 [HBM] removed stale replace option 2021-08-13 12:43:59 +02:00
Claudio Atzori 7ee2757fcd fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy 2021-08-13 12:41:01 +02:00
Claudio Atzori c3ad4ab701 minor fixes 2021-08-13 12:23:15 +02:00
Claudio Atzori baed5e3337 test classes moved in specific components 2021-08-13 12:14:47 +02:00
Claudio Atzori 3359f73fcf cleanup & best practices 2021-08-13 12:00:42 +02:00
Miriam Baglioni f4ec81c92c mergin with branch beta 2021-08-13 10:31:35 +02:00
Miriam Baglioni dc8b05b39e Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph 2021-08-13 10:18:25 +02:00
Miriam Baglioni 32fd75691f refactoring 2021-08-13 10:15:42 +02:00
Miriam Baglioni 01db1f8bc4 GetCSV refactoring - removed not needed import 2021-08-13 10:14:17 +02:00
Miriam Baglioni 964a46ca21 GetCSV refactoring - modified due to movement of classes 2021-08-13 10:11:18 +02:00
Miriam Baglioni eaf077fc34 GetCSV refactoring - removed not needed dependency 2021-08-13 10:08:58 +02:00
Miriam Baglioni 5f674efb0c moved dependency version in external pom 2021-08-13 10:07:53 +02:00
Miriam Baglioni 5cd5714530 GetCSV refactoring - added ignore annotation for fields not in input csv 2021-08-13 10:06:49 +02:00
Miriam Baglioni ed183d878e GetCSV refactoring - modified test classes due to change in the model of projects and programme 2021-08-13 09:28:51 +02:00
Miriam Baglioni 8769dd8eef GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:20:56 +02:00
Miriam Baglioni 6b9e1bf2e3 GetCSV refactoring - removing not needed dependency 2021-08-12 18:17:50 +02:00
Miriam Baglioni d57b2bb927 GetCSV refactoring - removing not needed dependency 2021-08-12 18:12:51 +02:00
Miriam Baglioni 9da74b544a GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:12:15 +02:00
Miriam Baglioni ab8abd61bb GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:11:07 +02:00
Miriam Baglioni 335a824e34 GetCSV refactoring - fixed issue 2021-08-12 18:10:10 +02:00
Miriam Baglioni f0845e9865 GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:58 +02:00
Miriam Baglioni 7a789423aa GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:27 +02:00
Miriam Baglioni e9fc3ef3bc GetCSV refactoring - changed to use the new class to get and write the csv file 2021-08-12 18:03:41 +02:00
Miriam Baglioni 4317211a2b GetCSV refactoring - refactoring due to movement 2021-08-12 18:03:14 +02:00
Miriam Baglioni b62cd656a7 GetCSV refactoring - changed the model to store only the information needed 2021-08-12 18:01:10 +02:00
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Claudio Atzori 9587d4aee8 Merge branch 'beta' into hostedbymap 2021-08-12 17:04:30 +02:00
Claudio Atzori 86d940044c added test to verify bad records from FWF-E-Book-Library 2021-08-12 11:32:56 +02:00
Claudio Atzori 8cdce59e0e [graph raw] let the mapping exceptions propagate 2021-08-12 11:32:26 +02:00
Miriam Baglioni 08dd2b2102 moving the dependency version to the external pom file 2021-08-11 18:09:41 +02:00
Miriam Baglioni ac417ca798 removed not needed test resource 2021-08-11 17:50:33 +02:00
Miriam Baglioni e33daaeee8 reverting 2021-08-11 17:46:19 +02:00
Miriam Baglioni 785db1d5b2 refactoring 2021-08-11 17:44:07 +02:00
Miriam Baglioni 95e5482bbb removing not needed dependency 2021-08-11 17:42:26 +02:00
Miriam Baglioni b966329833 reverting 2021-08-11 17:37:00 +02:00
Miriam Baglioni 8ad7c71417 reverting 2021-08-11 17:36:12 +02:00
Miriam Baglioni 0e1a6bec20 reverting 2021-08-11 17:32:29 +02:00
Miriam Baglioni c6a2a780a9 reverting 2021-08-11 17:30:17 +02:00
Miriam Baglioni b6b58bba28 reverting 2021-08-11 17:25:37 +02:00
Miriam Baglioni 804589eb30 reverting 2021-08-11 17:23:35 +02:00
Miriam Baglioni d688749ad9 reverting 2021-08-11 17:22:28 +02:00
Miriam Baglioni 524c06e028 reverting 2021-08-11 17:20:30 +02:00
Miriam Baglioni 7aa3260729 reverting 2021-08-11 17:18:45 +02:00
Miriam Baglioni 55fc500d8d reverting 2021-08-11 17:17:48 +02:00
Miriam Baglioni 8229632839 adding assertions to the mapping of the unibi part of gold list 2021-08-11 16:36:01 +02:00
Miriam Baglioni b1c6140ebf removed all comments in Italian 2021-08-11 16:23:33 +02:00
Miriam Baglioni 52c18c2697 removed not needed test class. Teh functionality has been moved 2021-08-11 16:16:55 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 61d811ba53 suggestions from intellij 2021-08-11 12:18:20 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni b954fe9ba8 mergin with branch beta 2021-08-11 10:12:46 +02:00
Miriam Baglioni b688567db5 hostedbymap - modified part of test to check the bestaccessright changed 2021-08-11 10:12:10 +02:00
Miriam Baglioni 9731a6144a hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result 2021-08-10 17:49:45 +02:00
Miriam Baglioni a90bac3bc9 Graph Dump - added method to test class to verify addition of validation date in projects for community result 2021-08-09 16:36:54 +02:00
Miriam Baglioni bd0d7bfba7 Graph Dump - added resources for testing addition of validation date in project for communityresult 2021-08-09 16:36:17 +02:00
Miriam Baglioni 8daaa32e90 Graph Dump - added resources for testing 2021-08-09 15:46:29 +02:00
Miriam Baglioni bc9e3a06ba Graph Dump - extended the test class 2021-08-09 15:46:06 +02:00
Claudio Atzori d64a942a76 fixed MappersTest 2021-08-09 12:32:26 +02:00
Miriam Baglioni 2efa5abda5 refactoring 2021-08-09 12:28:36 +02:00
Claudio Atzori 577f3b1ac8 added dnet workflows responsible for the graph construction, enrichment, provision 2021-08-09 11:53:58 +02:00
Miriam Baglioni da20fceaf7 removed all the part related to the crossref dump download since it is done in a separate workflow 2021-08-09 11:53:45 +02:00
Claudio Atzori 964f97ed4d cleanup 2021-08-09 11:53:06 +02:00
Miriam Baglioni 54a6cbb244 CrossrefDump - put token among the parameters 2021-08-09 11:41:10 +02:00
Miriam Baglioni b7079804cb CrossrefDump - put token among the parameters 2021-08-09 11:34:35 +02:00
Miriam Baglioni a5f82f442b Merge branch 'beta' into doiboost_wf 2021-08-09 11:17:51 +02:00
Miriam Baglioni b6dcf89d22 mergin with branch beta 2021-08-09 11:14:43 +02:00
Miriam Baglioni eff499af9f added new tests and changed the test example 2021-08-09 11:12:30 +02:00
Claudio Atzori a45b95ccc1 resolving conflicts for PR#134 2021-08-09 10:50:03 +02:00
Miriam Baglioni 5d70f842eb mergin with branch beta 2021-08-06 18:57:09 +02:00
Miriam Baglioni c3931557e3 extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects 2021-08-06 18:56:18 +02:00
Claudio Atzori 66f398fe6f Merge pull request '[stats] fixed a typo' (#133) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #133
2021-08-06 14:29:57 +02:00
Miriam Baglioni 6bd1eca7e0 merge branch with beta 2021-08-05 15:23:32 +02:00
Miriam Baglioni 73dc082927 added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result 2021-08-05 15:20:50 +02:00
Miriam Baglioni ee13da9258 merge branch with master 2021-08-05 11:34:20 +02:00
Miriam Baglioni bd096f5170 removed not needed param file 2021-08-05 10:55:43 +02:00
Miriam Baglioni 5faeefbda8 added script to download the dump,changed the workflow input paramenters 2021-08-05 10:54:03 +02:00
Miriam Baglioni 1965e4eece new workflow for downloading the dump of crossref and unpack it 2021-08-04 18:29:03 +02:00
Claudio Atzori 83c04e5d28 mapping test for dataset records adapted to reflect the delegated pid authority (zenodo) 2021-08-04 10:37:57 +02:00
Miriam Baglioni b4eb026c8b mergin with branch beta 2021-08-04 10:21:37 +02:00
Miriam Baglioni c7b71647c6 Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id 2021-08-04 10:20:02 +02:00
Miriam Baglioni eb8c3f8594 Hosted By Map - test modified because of the application of the new aggregator on datasources 2021-08-04 10:19:17 +02:00
Miriam Baglioni e94ae0b1de Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources 2021-08-04 10:18:11 +02:00
Miriam Baglioni 67ba4c40e0 Hosted By Map - added parameter resources 2021-08-04 10:17:28 +02:00
Miriam Baglioni eccf3851b0 Hosted By Map - refactoring 2021-08-04 10:16:30 +02:00
Sandro La Bruzzo 74afe43c3a fixed wrong test file 2021-08-04 10:16:17 +02:00
Miriam Baglioni 1e952cccf6 Hosted By Map - refactoring and deletion of not needed methods 2021-08-04 10:15:43 +02:00
Miriam Baglioni 8ba8c77f92 Hosted By Map - refactoring 2021-08-04 10:14:57 +02:00
Miriam Baglioni 8f7623e77a Hosted By Map - refactoring and application of the new aggregator 2021-08-04 10:14:20 +02:00
Sandro La Bruzzo 3fc820203b fixed wrong test file 2021-08-04 10:13:59 +02:00
Miriam Baglioni a7bf314fd2 Hosted By Map - added new aggregator to get just one result per datasource id 2021-08-04 10:13:30 +02:00
Miriam Baglioni 9831725073 Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals 2021-08-03 11:02:17 +02:00
Miriam Baglioni 100e54e6c8 mergin with branch beta 2021-08-03 10:47:11 +02:00
Miriam Baglioni 461b8a29a0 removed not needed class 2021-08-03 10:46:51 +02:00
Miriam Baglioni 327cddde33 Hosted By Map - refactoring 2021-08-03 10:44:13 +02:00
Miriam Baglioni 17292c6641 Hosted By Map - resources for testing purposes 2021-08-02 19:37:08 +02:00
Miriam Baglioni ee7ccb98dc Hosted By Map - test class to verify the application of the hbm to results and datasource 2021-08-02 19:36:18 +02:00
Miriam Baglioni 90e91486e2 Hosted By Map - test class to verify each step in the preparation process 2021-08-02 19:35:52 +02:00
Miriam Baglioni 1e859706a3 Hosted By Map - Classes to apply the HBM to results and datasources 2021-08-02 19:35:23 +02:00
Miriam Baglioni 72df8f9232 Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator 2021-08-02 19:34:44 +02:00
Miriam Baglioni ff1ce75e33 Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already 2021-08-02 19:32:59 +02:00
Claudio Atzori e826aae848 using constants from ModelConstants 2021-08-02 14:28:59 +02:00
Antonis Lempesis 117c3d5c67 fixed a typo 2021-08-02 12:15:58 +03:00
Miriam Baglioni 1695d45bd4 Hosted By Map - Test class to verify the preparation of the intermediate information 2021-07-30 17:57:01 +02:00
Miriam Baglioni 7c6ea2f4c7 Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities 2021-07-30 17:56:27 +02:00
Miriam Baglioni d8b9b0553b Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map 2021-07-30 17:55:39 +02:00
Miriam Baglioni 613bd3bde0 Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter 2021-07-30 17:54:45 +02:00
Miriam Baglioni d1807781c0 mergin with branch beta 2021-07-30 14:34:07 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Claudio Atzori 19620eed46 applying PR#131, Patch the identifiers (source/target) in the relations, refinements 2021-07-30 11:09:32 +02:00
Claudio Atzori 4f78565c04 fixed implementation of PatchRelationsApplication, refined the relative unit test 2021-07-30 11:07:09 +02:00
Claudio Atzori a6a38cca9e fixed implementation of PatchRelationsApplication, refined the relative unit test 2021-07-30 11:06:11 +02:00
Miriam Baglioni 9bc4fd3b69 Patch FCT relations - fixed issue with join 2021-07-30 10:34:05 +02:00
Miriam Baglioni 2fc89fc9b5 Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement 2021-07-30 10:20:43 +02:00
Claudio Atzori 081fe92a21 Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement 2021-07-30 10:13:56 +02:00
Claudio Atzori 576693d782 added unit test for PatchRelationsApplication 2021-07-30 10:13:33 +02:00
Claudio Atzori 55e6470f44 Merge pull request 'added the sprint 2 indicators in monitor db' (#129) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo 6358f92c3a added sleep to solve problem of lost request of creating index 2021-07-30 08:54:37 +02:00
Antonis Lempesis 26af0320d0 added the sprint 2 indicators in monitor db 2021-07-30 00:31:33 +03:00
Claudio Atzori 7b172e7cd9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-29 13:57:06 +02:00
Claudio Atzori c53d106e80 [provision] lowercase relation filter 2021-07-29 13:57:00 +02:00
Claudio Atzori 6e3554a45e [provision] lowercase relation filter 2021-07-29 13:56:37 +02:00
Sandro La Bruzzo b1b0cc3f15 fixed wrong package name 2021-07-29 13:55:08 +02:00
Miriam Baglioni baad01cadc hostedbymap 2021-07-29 13:04:39 +02:00
Claudio Atzori e725c88ebb [raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations 2021-07-29 13:03:43 +02:00
Claudio Atzori 5d08ad86ae [raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations 2021-07-29 13:03:16 +02:00
Claudio Atzori e87e1805c4 [raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset 2021-07-29 12:13:06 +02:00
Claudio Atzori 5f7330d407 Merge branch 'master' into fct_project_id_replacement 2021-07-29 11:38:22 +02:00
Claudio Atzori 1923c1ce21 replaced full join + filtering with a left join 2021-07-29 11:36:20 +02:00
Claudio Atzori dc55ed4acd Merge pull request '[beta] stats update workflow' (#128) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #128
2021-07-29 11:13:21 +02:00
Claudio Atzori 908f57a475 code formatting 2021-07-29 10:49:39 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Antonis Lempesis 4afa5215a9 fixed a NPE? 2021-07-28 21:59:12 +03:00
Antonis Lempesis 3d1580fa9b fixed a typo 2021-07-28 18:50:31 +03:00
Claudio Atzori 4c5a71ba2f [broker] updated relation descriptors, making use of constant values 2021-07-28 17:11:18 +02:00
Claudio Atzori a9961a1835 [cleaning] title cleaning based on the me.xuender:unidecode library 2021-07-28 16:36:33 +02:00
Claudio Atzori e1797c0a42 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-28 16:21:36 +02:00
Claudio Atzori 6dddad86ee [cleaning] title cleaning based on the me.xuender:unidecode library 2021-07-28 16:21:29 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Antonis Lempesis 9b181ffa73 added the h2020 classification scheme for projects 2021-07-28 16:31:29 +03:00
Alessia Bardi df8715a1ec format code after mvn compile 2021-07-28 11:58:26 +02:00
Michele Artini 3e2a2d6e71 added new fields in xml 2021-07-28 11:56:55 +02:00
Alessia Bardi c806387d4b tests for enermaps 2021-07-28 11:54:36 +02:00
Alessia Bardi 9594343725 code formatting after mvn compile 2021-07-28 11:41:34 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Michele Artini 9f1c7b8e17 tests 2021-07-28 11:32:34 +02:00
Antonis Lempesis 4a9741825d added result_orcid, result_project provenance, issn in datasources 2021-07-28 12:28:04 +03:00
Miriam Baglioni 3d2bba3d5d removing not needed classes 2021-07-28 11:25:43 +02:00
Miriam Baglioni cc0d3d8a7b mergin with branch beta 2021-07-28 11:24:46 +02:00
Michele Artini e6f1773d63 mapping of new eosc fields 2021-07-28 11:17:11 +02:00
Miriam Baglioni 80d5b3b4de DoiBoost AccessRigh #4362 - removing commented code 2021-07-28 11:16:49 +02:00
Miriam Baglioni 5fe016dcbc DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194 2021-07-28 11:14:28 +02:00
Miriam Baglioni 73ed7374a9 mergin with branch beta 2021-07-28 11:05:16 +02:00
Miriam Baglioni 43e62fcae9 DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193 2021-07-28 11:04:55 +02:00
Michele Artini c72c960ffb added eosc fields 2021-07-28 11:03:15 +02:00
Michele Artini 1fb572a33a added eosc fields 2021-07-28 10:52:24 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Miriam Baglioni 6c936943aa mergin with branch beta 2021-07-28 10:24:48 +02:00
Miriam Baglioni 0424f47494 HostedByMap fixing issues 2021-07-28 10:24:13 +02:00
Michele Artini 52e2315ba2 removed trick for datasourcetypeui 2021-07-28 10:23:00 +02:00
Claudio Atzori d267dce520 [raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset 2021-07-27 17:18:29 +02:00
Sandro La Bruzzo 825d9f0289 fixed datacite workflow starting from Importing delta 2021-07-27 16:09:46 +02:00
Claudio Atzori 5aa7d16d1b updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest 2021-07-27 15:11:58 +02:00
Claudio Atzori 998b66855a updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest 2021-07-27 15:11:37 +02:00
Antonis Lempesis 1a28a69cac changed the citeee in *_citations to cites 2021-07-27 15:14:09 +03:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Miriam Baglioni 35e395eae8 merge with master 2021-07-27 12:34:59 +02:00
Miriam Baglioni eb07f7f40f Hosted By Map 2021-07-27 12:27:26 +02:00
Antonis Lempesis ed185fd7ed added missing colons 2021-07-27 11:42:47 +03:00
Antonis Lempesis f3b9570354 properly invalidating metadata 2021-07-26 13:00:16 +03:00
Sandro La Bruzzo 848aabbb6c minor fix 2021-07-25 12:06:41 +02:00
Sandro La Bruzzo 8fac10c91e fixed defintion wf of creation final infospace of scholexplorer 2021-07-25 11:15:37 +02:00
Sandro La Bruzzo 3920c69bc8 change implementation of resolve Relation to generate jsonRdd in output 2021-07-25 09:51:36 +02:00
Antonis Lempesis f9fbb0f261 added indicators second sprint 2021-07-24 16:40:28 +03:00
Claudio Atzori a0393607a7 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:15:08 +02:00
Claudio Atzori 5b6844b969 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:14:37 +02:00
Sandro La Bruzzo d9e3b89937 implemented last part of workflows to generate scholixGraph 2021-07-23 16:38:32 +02:00
Sandro La Bruzzo cfde63a7c3 fixed resolve relation join 2021-07-23 14:17:29 +02:00
Sandro La Bruzzo 4a439c3863 NPE fixed 2021-07-23 14:17:29 +02:00