1
0
Fork 0
Commit Graph

3118 Commits

Author SHA1 Message Date
Claudio Atzori 941a50a2fc Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-11-15 14:42:49 +01:00
Claudio Atzori 7c804acda8 [graph resolution] minor 2021-11-15 14:42:43 +01:00
Sandro La Bruzzo efa09057db Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-11-15 14:32:09 +01:00
Sandro La Bruzzo 48923e46a1 added documentation to Pubmed Class and also added mvn site for dhp-aggregations 2021-11-15 14:32:01 +01:00
Claudio Atzori d2c787d416 [graph resolution] fixed sequence of the workflow steps 2021-11-15 14:31:15 +01:00
Claudio Atzori 975b10b711 [actionmanager] increased spark.sql.shuffle.partitions to 5000 2021-11-15 12:31:45 +01:00
Miriam Baglioni 4ec88c718c merge with beta - resolved conflict in pom 2021-11-15 10:52:16 +01:00
Miriam Baglioni 6f1a434e90 [Bypass Action Set] Fixed test to consider the new identifier utils 2021-11-15 09:59:23 +01:00
Miriam Baglioni 157d33ebf9 [Bypass Action Set] Refactoring 2021-11-15 09:58:48 +01:00
Miriam Baglioni 6595135a1a [Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type 2021-11-12 11:45:38 +01:00
Miriam Baglioni 43cae4ad88 Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump 2021-11-12 11:36:54 +01:00
Miriam Baglioni b3f9370125 merge with beta - resolved conflict in pom 2021-11-12 11:25:26 +01:00
Miriam Baglioni 92d0e18b55 [Bypass Action Set] used constant DOI instead of "doi" 2021-11-12 10:56:58 +01:00
Miriam Baglioni 881113743f [Bypass Action Set] refactoring 2021-11-12 10:55:50 +01:00
Miriam Baglioni 47ccb53c4f [Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment) 2021-11-12 10:54:09 +01:00
Miriam Baglioni ffb0ce1d59 merge with beta - resolved conflict in pom 2021-11-12 10:19:59 +01:00
Miriam Baglioni 716021546e [Bypass Action Set] minor fix 2021-11-12 10:18:01 +01:00
Sandro La Bruzzo 3469cc2b1d Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-11-12 09:56:52 +01:00
Sandro La Bruzzo a7763d2492 removed alternate identifier in resolutionMap 2021-11-12 09:56:45 +01:00
Miriam Baglioni b8bdabfae9 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:16:48 +01:00
Miriam Baglioni e5498052e8 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:14:10 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Antonis Lempesis 26f086dd64 removed the too restrctive clause. will discuss again 2021-11-11 12:57:19 +02:00
Claudio Atzori 148289150f Merge branch 'beta' into doiboost_url 2021-11-11 10:40:19 +01:00
Sandro La Bruzzo 2ca0a436ad added SparkResolveEntities node to the oozie wf 2021-11-11 10:25:42 +01:00
Sandro La Bruzzo 9cb195314f implemented and tested resolution of entities 2021-11-11 10:17:40 +01:00
Miriam Baglioni 6d3c4c4abe mergin with branch beta 2021-11-11 08:59:53 +01:00
Miriam Baglioni 8cc50ecee0 [Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT 2021-11-11 08:59:20 +01:00
Miriam Baglioni 88b73f4f49 mergin with branch beta 2021-11-10 17:00:52 +01:00
Miriam Baglioni c371b23077 - 2021-11-10 17:00:37 +01:00
Miriam Baglioni 9e214ce0eb [BypassAS] addition of OC relations 2021-11-09 12:07:19 +01:00
Sandro La Bruzzo 6477a40670 implement filter of openCitation 2021-11-09 11:27:12 +01:00
Miriam Baglioni 6f7ca539c6 [BypassAS] update of results for bipFinder and FOS 2021-11-09 11:25:41 +01:00
Miriam Baglioni a7d50c499b [BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores 2021-11-08 16:44:19 +01:00
Antonis Lempesis 91354c6068 - fetching all context related results
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni 94918a673c [Graph DUMP] Fix issue for empty origilaId list 2021-11-08 10:25:28 +01:00
Claudio Atzori 9cb8e4ad21 Merge branch 'beta' into hierarchical_orgs_relations 2021-11-08 09:40:24 +01:00
Miriam Baglioni 4c70201412 mergin with branch beta 2021-11-05 12:29:56 +01:00
Miriam Baglioni 8442efd8d1 [Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE 2021-11-05 12:29:22 +01:00
Claudio Atzori 5681e89544 Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json' 2021-11-05 12:18:24 +01:00
Miriam Baglioni a22c29fba1 [Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE 2021-11-05 12:08:33 +01:00
Miriam Baglioni c10ff6928c [Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity 2021-11-05 11:36:21 +01:00
Miriam Baglioni 0857849a86 [Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance) 2021-11-05 11:02:37 +01:00
Miriam Baglioni df7ee77c7a [DOIBoost Mapping] removed not needed comments 2021-11-04 16:24:07 +01:00
Miriam Baglioni de63d29b6f [DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id) 2021-11-04 16:16:40 +01:00
Miriam Baglioni d50057b2d9 [DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi 2021-11-03 16:59:37 +01:00
Miriam Baglioni edf55395e9 added test resourse 2021-11-03 16:49:30 +01:00
Miriam Baglioni d97ea82a29 [DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi 2021-11-03 16:45:15 +01:00
Miriam Baglioni 96769b4481 [DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed 2021-11-03 16:43:36 +01:00
Miriam Baglioni 683fe093cf [DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record 2021-11-03 15:51:26 +01:00
Miriam Baglioni b2bb8d9d79 [DOIBoost - Mapping] selecting the url from Crossref containing the doi 2021-11-03 15:44:57 +01:00
Miriam Baglioni 779318961c [DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url 2021-11-03 14:38:52 +01:00
Miriam Baglioni 2480e590d1 [DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis 2021-11-03 14:25:23 +01:00
Miriam Baglioni b9d124bb7c [Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation) 2021-11-03 13:55:37 +01:00
Sandro La Bruzzo 7bd224f051 implement first version of scholexplorer integration for the generation of final graph 2021-11-02 15:58:15 +01:00
Claudio Atzori 7fa49f6956 Merge pull request 'removed hardcoded reference' (#154) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis f78afb5ef9 removed hardcoded reference 2021-11-01 15:42:29 +02:00
Miriam Baglioni 2aca6bfa0a mergin with branch beta 2021-10-29 11:20:45 +02:00
Miriam Baglioni 09f36cffb8 [Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation 2021-10-29 11:20:03 +02:00
Claudio Atzori 1225ba0b92 [resolution] increasing number of partitions to avoid OOM 2021-10-28 16:18:17 +02:00
Sandro La Bruzzo d9cbca83f7 moved filter on next phase 2021-10-28 16:13:24 +02:00
Claudio Atzori d02caef185 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-27 15:36:29 +02:00
Sandro La Bruzzo 1be9aa0a5f Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta 2021-10-26 17:52:20 +02:00
Sandro La Bruzzo 4acfa8fa2e Scholexplorer Datasource Aggregation:
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni d0ef7d91c5 adding test resource 2021-10-26 17:34:11 +02:00
Sandro La Bruzzo 034304b33a conflict resolved on merge 2021-10-26 09:40:47 +02:00
Michele Artini d66e20e7ac added hierarchy rel in ROR actionset 2021-10-21 15:51:48 +02:00
Claudio Atzori d147295c2f avoiding java.io.NotSerializableException: java.util.HashMap 2021-10-21 14:15:57 +02:00
Claudio Atzori 3702fe478d cleanup 2021-10-21 12:05:02 +02:00
Sandro La Bruzzo ac36aa7d1c fixed wrong Encoding during a map phase 2021-10-21 11:35:02 +02:00
Sandro La Bruzzo aeeebd573b code refactor renamed datacite package 2021-10-20 17:37:42 +02:00
Sandro La Bruzzo ab3a99d3e9 removed old datacite oozie workflow 2021-10-20 17:19:47 +02:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori 4f8970f8ed [stats] reducing the step22 wait time 2021-10-20 14:14:53 +02:00
Claudio Atzori 00b78b9c58 cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies 2021-10-20 14:04:45 +02:00
Claudio Atzori c01dd0c925 registered oaf model classes for the KryoSerializer 2021-10-20 13:55:07 +02:00
Miriam Baglioni 652114c641 [affiliationPropagation] first try. preparetion 2021-10-20 11:44:23 +02:00
Claudio Atzori 59f76b50d4 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-20 09:42:35 +02:00
Antonis Lempesis 241dcf6df1 Merge branch 'beta' into beta 2021-10-19 23:54:21 +02:00
Claudio Atzori 515e068a78 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-19 16:46:06 +02:00
Claudio Atzori 512e7b0170 code formatting 2021-10-19 16:19:29 +02:00
Claudio Atzori e9157c67aa Merge branch 'beta' into dump 2021-10-19 16:15:03 +02:00
Claudio Atzori 98f37c8d81 WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 16:14:40 +02:00
Claudio Atzori c8850456e9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-19 16:09:54 +02:00
Sandro La Bruzzo c9870c5122 code formatted 2021-10-19 15:24:59 +02:00
Sandro La Bruzzo f8329bc110 since dhp-schemas changed, introducing new Relation inverse model, this class has been updated 2021-10-19 15:24:22 +02:00
Claudio Atzori 7a73010acd WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 11:59:16 +02:00
Miriam Baglioni c7f6cd2591 added again the setting for saXReader 2021-10-19 10:15:26 +02:00
miconis 5f780a6ba1 bug fix in migrate entities: parameter name was wrong 2021-10-18 23:30:40 +02:00
Miriam Baglioni 1315952702 merge with branch beta 2021-10-18 14:17:09 +02:00
Miriam Baglioni 1cc09adfaa Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation 2021-10-18 14:11:27 +02:00
Miriam Baglioni 76d41602be Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-18 10:53:22 +02:00
Miriam Baglioni 46f82c7c8f removed not needed folder deletion 2021-10-18 10:53:16 +02:00
Sandro La Bruzzo 7b15b88d4c renamed wrong package, implemented last aggregation workflow for scholexplorer 2021-10-15 15:00:15 +02:00
Antonis Lempesis 41ecb1eb61 invalidating medatadata before context thingies 2021-10-15 13:42:55 +03:00
Antonis Lempesis 4b7c8dff2d fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db 2021-10-14 18:53:35 +03:00
Sandro La Bruzzo 51a03c0a50 refactor code for EBI from dhp-graph-mapper into dhp-aggregation 2021-10-14 14:23:13 +02:00
Claudio Atzori 14fbf92ad6 Merge branch 'beta' into beta_solr_config 2021-10-14 11:08:44 +02:00
Claudio Atzori b292e4a700 [stats wf] added extra logging in the context data retrieval phase 2021-10-13 17:31:53 +02:00
miconis 995c1eddaf minor change 2021-10-13 17:07:10 +02:00
Miriam Baglioni 5d9cc2452d changed the working path parameter value as dependant from the dnet-workflow working dir parameter 2021-10-13 15:33:50 +02:00
miconis 326bf63775 integration of parent child orgs relations 2021-10-13 12:24:48 +02:00
Miriam Baglioni 16b28494a9 added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset 2021-10-13 11:34:24 +02:00
Miriam Baglioni 63933808d4 added fix for mixing result types, added configuration default to funder subworkflow 2021-10-13 11:28:28 +02:00
Sandro La Bruzzo 7387416e90 added params skip update to direct transform in OAF, this should be set to true in production 2021-10-12 12:36:30 +02:00
Sandro La Bruzzo 511da98d0c - fixed bug on download pmc Article
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni fec40bdd95 merging with branch beta - resolved conflicts 2021-10-12 09:16:36 +02:00
Miriam Baglioni 83f51f1812 refactoring 2021-10-12 09:14:43 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Serafeim Chatzopoulos 201ce71cc1 Add resultsubject, relprojectname and resultacceptanceyear to __all field 2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos e468a7b96b Add tests to query Solr with different configurations 2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos de81007302 Add exploreTestConfig, a new Solr configuration folder 2021-10-08 16:54:56 +03:00
Sandro La Bruzzo 8f99d2af86 Make the node of doiBoost to point to the correct OpenAire Organization in relations 2021-10-08 08:35:12 +02:00
Alessia Bardi c48c43fa9e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-07 17:30:53 +02:00
Alessia Bardi 8d3b60f446 test for patching records for EOSC Future 2021-10-07 17:30:45 +02:00
miconis 611ca511db set configuration property in openorgs duplicates wf 2021-10-07 15:39:55 +02:00
miconis 9646b9fd98 implementation of the http call for the update of openorgs suggestions 2021-10-07 11:29:11 +02:00
Sandro La Bruzzo 2557bb41f5 Implemented new method for update baseline inside scala node 2021-10-06 16:41:08 +02:00
Sandro La Bruzzo b84e0cabeb Implemented new method for update baseline 2021-10-05 16:34:47 +02:00
Sandro La Bruzzo f258bbb927 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-10-05 10:21:50 +02:00
Sandro La Bruzzo 991b06bd0b removed generation of EBI links from old dump, now EBI link dump is created by another wf 2021-10-05 10:21:33 +02:00
Claudio Atzori cb7efe12ac Merge pull request 'beta' (#146) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#146
2021-10-05 10:09:37 +02:00
Miriam Baglioni e653756e3d applied some suggestiond from Sonar Lint 2021-10-04 18:40:07 +02:00
dimitrispie 3f25d2efb2 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 2021-10-01 16:03:48 +03:00
dimitrispie 13687fd887 Sprint 3 indicators update 2021-10-01 16:02:02 +03:00
Miriam Baglioni 9814c3e700 mergin with branch beta 2021-10-01 13:00:03 +02:00
Miriam Baglioni c4ccd7b32c - 2021-10-01 12:59:47 +02:00
Miriam Baglioni c8321ad31a merge with branch beta 2021-10-01 12:59:08 +02:00
Claudio Atzori 60a6a9a583 [graph2hive] added field 'measures' to the result view 2021-09-30 09:27:26 +02:00
Sandro La Bruzzo 66702b1973 Added node to update datacite 2021-09-28 08:59:06 +02:00
Sandro La Bruzzo 477cb10715 Merge remote-tracking branch 'origin/beta' into beta 2021-09-27 16:57:23 +02:00
Sandro La Bruzzo be79d74e3d Fixed DoiBoost generation to point to correct organization in affiliation relation 2021-09-27 16:57:04 +02:00
Claudio Atzori 474117c2e8 Merge branch 'beta' into dedup_whitelist 2021-09-27 16:41:25 +02:00
Miriam Baglioni 476a4708d6 mergin with branch beta 2021-09-27 16:02:32 +02:00
Miriam Baglioni 5ec69889db OpenCitations: creation of AS from OC 2021-09-27 16:02:06 +02:00
Claudio Atzori a53acfbc06 Merge pull request '[stats] updates in the mapping, indicators, wf' (#145) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#145
2021-09-27 15:59:54 +02:00
Alessia Bardi b924276e18 tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject 2021-09-24 17:11:56 +02:00
Antonis Lempesis a1e1cf32d7 fixed an impala error 2021-09-24 12:57:24 +03:00
Antonis Lempesis f358cabb2b fixed typo 2021-09-22 21:50:37 +03:00
Miriam Baglioni eedf7c3310 mergin with branch beta 2021-09-22 15:18:34 +02:00
Miriam Baglioni f2118d771a first steps in the implementation of the integration of opencitations 2021-09-22 15:18:05 +02:00
Claudio Atzori 7fa60e166e Merge branch 'beta' into dedup_whitelist 2021-09-22 11:31:18 +02:00
Antonis Lempesis 421d55265d created hive action for observatory queries 2021-09-21 03:07:58 +03:00
Enrico Ottonello 92a63f78fe multiple download attempts handling if a connection to orcid server fails 2021-09-20 18:25:00 +02:00
Enrico Ottonello 0c74f5667e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-09-20 18:12:31 +02:00
miconis 853333bdde implementation of the whitelist for similarity relations 2021-09-20 16:21:47 +02:00
Antonis Lempesis 8b681dcf1b attempt to make the observatory wf run in hive 2021-09-18 00:35:14 +03:00
Antonis Lempesis 2943287d10 fixed the definition of cc_licence, part II 2021-09-16 15:59:06 +03:00
Antonis Lempesis dd2329849f fixed the definition of cc_licence 2021-09-16 13:50:34 +03:00
Claudio Atzori 09c2eb7f62 Merge branch 'beta' into clean_relations 2021-09-16 11:09:47 +02:00
Miriam Baglioni e9ccdf853f related to D-Net/dnet-hadoop#132 2021-09-15 18:44:54 +02:00
Claudio Atzori 12766bf5f2 Merge branch 'beta' into clean_relations 2021-09-15 17:18:15 +02:00
Claudio Atzori 663b1556d7 manually integrating PR#140 D-Net/dnet-hadoop#140 2021-09-15 16:40:25 +02:00
Claudio Atzori ebf53a1616 added cleaning for relation fields: subRelType & relClass according to dedicated vocabs 2021-09-15 16:10:37 +02:00
Enrico Ottonello 8b804e7fe1 removed unused imports 2021-09-14 17:30:52 +02:00
Enrico Ottonello aefa36c54b other task executions go ahead if UnknownHostException happens on a single task 2021-09-14 17:26:15 +02:00
Antonis Lempesis de9bf3a161 added cc_licences and abstracts in observatory db 2021-09-14 01:29:08 +03:00
Antonis Lempesis 9b1936701c fixed yet another typo 2021-09-13 21:07:44 +03:00
Antonis Lempesis 8fc89ae822 moved context table creation before indicators 2021-09-13 14:33:23 +03:00
Antonis Lempesis 461bf90ca6 fixed the gold_oa definition 2021-09-13 11:10:30 +03:00
Antonis Lempesis 43852bac0e creating other::other concept for all contexts 2021-09-13 01:36:41 +03:00
Antonis Lempesis f13cca7e83 moved dependencies of indicators before them... 2021-09-08 23:07:58 +03:00
Antonis Lempesis c6ada217a1 fixed typo 2021-09-08 22:34:59 +03:00
Antonis Lempesis 1250ae197f using new indicators for the definition of peerreviewed, gold, and green 2021-09-08 14:08:43 +03:00
Antonis Lempesis ccee451dde added indicators of sprint 2 in monitor db 2021-09-07 23:17:13 +03:00
Sandro La Bruzzo aed29156c7 changed behavior in transformation job, that doesn't fail at first error 2021-09-07 19:05:46 +02:00
Sandro La Bruzzo 3c6fc2096c fix bug on oai iterator that skip record cleaned 2021-09-07 10:46:26 +02:00
Sandro La Bruzzo d4dadf6d77 reduced max number of PID in Relatedentity 2021-09-02 14:21:24 +02:00
Sandro La Bruzzo 9f8a80deb7 fixed wrong import of unresolved relation in openaire 2021-09-01 14:16:27 +02:00
Alessia Bardi 3762b17f7b added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo e8b3cb9147 Implemented method to download delta updates in EBI Links 2021-08-30 09:32:45 +02:00
Alessia Bardi ccf4103a25 keep the original url if the decoder fails for any reason 2021-08-25 10:07:58 +02:00
Sandro La Bruzzo 45898c71ac fixed wrong doi in pubmed 2021-08-24 15:20:04 +02:00
Alessia Bardi 00a28c0080 originalId was renamed to acronym 2021-08-23 15:02:21 +02:00
Alessia Bardi f19b04d41b code formatting after mvn compile 2021-08-23 14:33:39 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Alessia Bardi 4c1474e693 Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data 2021-08-20 17:03:30 +02:00
Miriam Baglioni 5f8ccbc365 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-08-20 11:13:47 +02:00
Miriam Baglioni 882abb40e4 CrossrefDump - 2021-08-20 11:12:53 +02:00
Miriam Baglioni 45c62609af CrossrefDump - modified because parameter file was moved 2021-08-20 11:12:31 +02:00
Miriam Baglioni 35880c0e7b CrossrefDump - changed the wf to be able to resume from one of the steps 2021-08-20 11:11:35 +02:00
Miriam Baglioni f3b6c392c1 CrossrefDump - moving parameter file under folder crossref_dump_reader 2021-08-20 11:10:58 +02:00
Miriam Baglioni 65822400ce CrossrefDump - added new parameter file that was missing 2021-08-20 11:10:35 +02:00
Alessia Bardi a053e1513c different funders in blacklist from BETA and PROD aggregator 2021-08-19 11:32:27 +02:00
Alessia Bardi 812bd54c57 different funders in blacklist from BETA and PROD aggregator 2021-08-19 11:30:14 +02:00
Miriam Baglioni a65d3caaea Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-08-19 10:29:10 +02:00
Miriam Baglioni e5cf11d088 change open access route to result matching hbm to gold 2021-08-19 10:29:04 +02:00
Claudio Atzori 7c0c67bdd6 added mock pom 2021-08-13 17:45:53 +02:00
Claudio Atzori 82086f3422 fixed directory name 2021-08-13 17:42:14 +02:00
Claudio Atzori bc7068106c added crossref download oozie workflow 2021-08-13 17:19:44 +02:00
Claudio Atzori 2c0a05f11a manually merged PR#139 2021-08-13 17:15:53 +02:00
Claudio Atzori d43667d857 Merge pull request 'Automatic download of Crossref' (#138) from crossref_dw_wf into beta
Reviewed-on: D-Net/dnet-hadoop#138
2021-08-13 17:10:10 +02:00
Miriam Baglioni 5856ca8a7b merging with branch beta - resolved conflicts 2021-08-13 16:45:45 +02:00
Miriam Baglioni 6fec71e8d2 removed the specific of the infra we are running the wf from the wf name 2021-08-13 16:39:02 +02:00
Miriam Baglioni ed7e28490a change in sh 2021-08-13 16:19:01 +02:00
Claudio Atzori 7743d0f919 consolidated dnet wf profiles into the same submodule 2021-08-13 16:14:54 +02:00
Miriam Baglioni 6eb7508995 mergin with branch beta 2021-08-13 16:07:04 +02:00
Claudio Atzori f74adc4752 added DownloadCSV2 as alternative implementation of the same download procedure 2021-08-13 15:52:15 +02:00
Claudio Atzori 5f0903d50d fixed CSV downloader & tests 2021-08-13 14:17:54 +02:00
Claudio Atzori 17cefe6a97 [HBM] removed stale replace option 2021-08-13 12:43:59 +02:00
Claudio Atzori 7ee2757fcd fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy 2021-08-13 12:41:01 +02:00
Claudio Atzori c3ad4ab701 minor fixes 2021-08-13 12:23:15 +02:00
Claudio Atzori baed5e3337 test classes moved in specific components 2021-08-13 12:14:47 +02:00
Claudio Atzori 3359f73fcf cleanup & best practices 2021-08-13 12:00:42 +02:00
Miriam Baglioni f4ec81c92c mergin with branch beta 2021-08-13 10:31:35 +02:00
Miriam Baglioni dc8b05b39e Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph 2021-08-13 10:18:25 +02:00
Miriam Baglioni 32fd75691f refactoring 2021-08-13 10:15:42 +02:00
Miriam Baglioni 01db1f8bc4 GetCSV refactoring - removed not needed import 2021-08-13 10:14:17 +02:00
Miriam Baglioni 964a46ca21 GetCSV refactoring - modified due to movement of classes 2021-08-13 10:11:18 +02:00
Miriam Baglioni eaf077fc34 GetCSV refactoring - removed not needed dependency 2021-08-13 10:08:58 +02:00
Miriam Baglioni 5f674efb0c moved dependency version in external pom 2021-08-13 10:07:53 +02:00
Miriam Baglioni 5cd5714530 GetCSV refactoring - added ignore annotation for fields not in input csv 2021-08-13 10:06:49 +02:00
Miriam Baglioni ed183d878e GetCSV refactoring - modified test classes due to change in the model of projects and programme 2021-08-13 09:28:51 +02:00
Miriam Baglioni 8769dd8eef GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:20:56 +02:00
Miriam Baglioni 6b9e1bf2e3 GetCSV refactoring - removing not needed dependency 2021-08-12 18:17:50 +02:00
Miriam Baglioni d57b2bb927 GetCSV refactoring - removing not needed dependency 2021-08-12 18:12:51 +02:00
Miriam Baglioni 9da74b544a GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:12:15 +02:00
Miriam Baglioni ab8abd61bb GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:11:07 +02:00
Miriam Baglioni 335a824e34 GetCSV refactoring - fixed issue 2021-08-12 18:10:10 +02:00
Miriam Baglioni f0845e9865 GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:58 +02:00
Miriam Baglioni 7a789423aa GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:27 +02:00
Miriam Baglioni e9fc3ef3bc GetCSV refactoring - changed to use the new class to get and write the csv file 2021-08-12 18:03:41 +02:00
Miriam Baglioni 4317211a2b GetCSV refactoring - refactoring due to movement 2021-08-12 18:03:14 +02:00
Miriam Baglioni b62cd656a7 GetCSV refactoring - changed the model to store only the information needed 2021-08-12 18:01:10 +02:00
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Claudio Atzori 9587d4aee8 Merge branch 'beta' into hostedbymap 2021-08-12 17:04:30 +02:00
Claudio Atzori 86d940044c added test to verify bad records from FWF-E-Book-Library 2021-08-12 11:32:56 +02:00
Claudio Atzori 8cdce59e0e [graph raw] let the mapping exceptions propagate 2021-08-12 11:32:26 +02:00
Miriam Baglioni 08dd2b2102 moving the dependency version to the external pom file 2021-08-11 18:09:41 +02:00
Miriam Baglioni ac417ca798 removed not needed test resource 2021-08-11 17:50:33 +02:00
Miriam Baglioni e33daaeee8 reverting 2021-08-11 17:46:19 +02:00
Miriam Baglioni 785db1d5b2 refactoring 2021-08-11 17:44:07 +02:00
Miriam Baglioni 95e5482bbb removing not needed dependency 2021-08-11 17:42:26 +02:00
Miriam Baglioni b966329833 reverting 2021-08-11 17:37:00 +02:00
Miriam Baglioni 8ad7c71417 reverting 2021-08-11 17:36:12 +02:00
Miriam Baglioni 0e1a6bec20 reverting 2021-08-11 17:32:29 +02:00
Miriam Baglioni c6a2a780a9 reverting 2021-08-11 17:30:17 +02:00
Miriam Baglioni b6b58bba28 reverting 2021-08-11 17:25:37 +02:00
Miriam Baglioni 804589eb30 reverting 2021-08-11 17:23:35 +02:00
Miriam Baglioni d688749ad9 reverting 2021-08-11 17:22:28 +02:00
Miriam Baglioni 524c06e028 reverting 2021-08-11 17:20:30 +02:00
Miriam Baglioni 7aa3260729 reverting 2021-08-11 17:18:45 +02:00
Miriam Baglioni 55fc500d8d reverting 2021-08-11 17:17:48 +02:00
Miriam Baglioni 8229632839 adding assertions to the mapping of the unibi part of gold list 2021-08-11 16:36:01 +02:00
Miriam Baglioni b1c6140ebf removed all comments in Italian 2021-08-11 16:23:33 +02:00
Miriam Baglioni 52c18c2697 removed not needed test class. Teh functionality has been moved 2021-08-11 16:16:55 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 61d811ba53 suggestions from intellij 2021-08-11 12:18:20 +02:00