Commit Graph

1050 Commits

Author SHA1 Message Date
Miriam Baglioni 43cae4ad88 Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump 2021-11-12 11:36:54 +01:00
Miriam Baglioni b3f9370125 merge with beta - resolved conflict in pom 2021-11-12 11:25:26 +01:00
Miriam Baglioni ffb0ce1d59 merge with beta - resolved conflict in pom 2021-11-12 10:19:59 +01:00
Sandro La Bruzzo a7763d2492 removed alternate identifier in resolutionMap 2021-11-12 09:56:45 +01:00
Miriam Baglioni b8bdabfae9 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:16:48 +01:00
Miriam Baglioni e5498052e8 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:14:10 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Sandro La Bruzzo 2ca0a436ad added SparkResolveEntities node to the oozie wf 2021-11-11 10:25:42 +01:00
Sandro La Bruzzo 9cb195314f implemented and tested resolution of entities 2021-11-11 10:17:40 +01:00
Miriam Baglioni 8cc50ecee0 [Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT 2021-11-11 08:59:20 +01:00
Miriam Baglioni 88b73f4f49 mergin with branch beta 2021-11-10 17:00:52 +01:00
Sandro La Bruzzo 6477a40670 implement filter of openCitation 2021-11-09 11:27:12 +01:00
Miriam Baglioni 94918a673c [Graph DUMP] Fix issue for empty origilaId list 2021-11-08 10:25:28 +01:00
Miriam Baglioni 8442efd8d1 [Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE 2021-11-05 12:29:22 +01:00
Claudio Atzori 5681e89544 Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json' 2021-11-05 12:18:24 +01:00
Miriam Baglioni a22c29fba1 [Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE 2021-11-05 12:08:33 +01:00
Miriam Baglioni c10ff6928c [Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity 2021-11-05 11:36:21 +01:00
Miriam Baglioni 0857849a86 [Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance) 2021-11-05 11:02:37 +01:00
Sandro La Bruzzo 7bd224f051 implement first version of scholexplorer integration for the generation of final graph 2021-11-02 15:58:15 +01:00
Claudio Atzori 1225ba0b92 [resolution] increasing number of partitions to avoid OOM 2021-10-28 16:18:17 +02:00
Sandro La Bruzzo d9cbca83f7 moved filter on next phase 2021-10-28 16:13:24 +02:00
Sandro La Bruzzo 1be9aa0a5f Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta 2021-10-26 17:52:20 +02:00
Sandro La Bruzzo 4acfa8fa2e Scholexplorer Datasource Aggregation:
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo 034304b33a conflict resolved on merge 2021-10-26 09:40:47 +02:00
Claudio Atzori d147295c2f avoiding java.io.NotSerializableException: java.util.HashMap 2021-10-21 14:15:57 +02:00
Claudio Atzori 3702fe478d cleanup 2021-10-21 12:05:02 +02:00
Sandro La Bruzzo ac36aa7d1c fixed wrong Encoding during a map phase 2021-10-21 11:35:02 +02:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori 00b78b9c58 cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies 2021-10-20 14:04:45 +02:00
Claudio Atzori c01dd0c925 registered oaf model classes for the KryoSerializer 2021-10-20 13:55:07 +02:00
Claudio Atzori 515e068a78 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-19 16:46:06 +02:00
Claudio Atzori 512e7b0170 code formatting 2021-10-19 16:19:29 +02:00
Claudio Atzori e9157c67aa Merge branch 'beta' into dump 2021-10-19 16:15:03 +02:00
Claudio Atzori 98f37c8d81 WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 16:14:40 +02:00
Claudio Atzori c8850456e9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-10-19 16:09:54 +02:00
Claudio Atzori 7a73010acd WIP: worflow nodes for including Scholexplorer records in the RAW graph 2021-10-19 11:59:16 +02:00
Miriam Baglioni c7f6cd2591 added again the setting for saXReader 2021-10-19 10:15:26 +02:00
miconis 5f780a6ba1 bug fix in migrate entities: parameter name was wrong 2021-10-18 23:30:40 +02:00
Miriam Baglioni 1315952702 merge with branch beta 2021-10-18 14:17:09 +02:00
Sandro La Bruzzo 7b15b88d4c renamed wrong package, implemented last aggregation workflow for scholexplorer 2021-10-15 15:00:15 +02:00
Sandro La Bruzzo 51a03c0a50 refactor code for EBI from dhp-graph-mapper into dhp-aggregation 2021-10-14 14:23:13 +02:00
miconis 995c1eddaf minor change 2021-10-13 17:07:10 +02:00
miconis 326bf63775 integration of parent child orgs relations 2021-10-13 12:24:48 +02:00
Miriam Baglioni 63933808d4 added fix for mixing result types, added configuration default to funder subworkflow 2021-10-13 11:28:28 +02:00
Miriam Baglioni fec40bdd95 merging with branch beta - resolved conflicts 2021-10-12 09:16:36 +02:00
Miriam Baglioni 83f51f1812 refactoring 2021-10-12 09:14:43 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Sandro La Bruzzo 2557bb41f5 Implemented new method for update baseline inside scala node 2021-10-06 16:41:08 +02:00
Sandro La Bruzzo b84e0cabeb Implemented new method for update baseline 2021-10-05 16:34:47 +02:00
Sandro La Bruzzo 991b06bd0b removed generation of EBI links from old dump, now EBI link dump is created by another wf 2021-10-05 10:21:33 +02:00
Miriam Baglioni e653756e3d applied some suggestiond from Sonar Lint 2021-10-04 18:40:07 +02:00
Miriam Baglioni 9814c3e700 mergin with branch beta 2021-10-01 13:00:03 +02:00
Miriam Baglioni c4ccd7b32c - 2021-10-01 12:59:47 +02:00
Miriam Baglioni c8321ad31a merge with branch beta 2021-10-01 12:59:08 +02:00
Claudio Atzori 60a6a9a583 [graph2hive] added field 'measures' to the result view 2021-09-30 09:27:26 +02:00
Claudio Atzori ebf53a1616 added cleaning for relation fields: subRelType & relClass according to dedicated vocabs 2021-09-15 16:10:37 +02:00
Sandro La Bruzzo e8b3cb9147 Implemented method to download delta updates in EBI Links 2021-08-30 09:32:45 +02:00
Alessia Bardi ccf4103a25 keep the original url if the decoder fails for any reason 2021-08-25 10:07:58 +02:00
Sandro La Bruzzo 45898c71ac fixed wrong doi in pubmed 2021-08-24 15:20:04 +02:00
Alessia Bardi 00a28c0080 originalId was renamed to acronym 2021-08-23 15:02:21 +02:00
Alessia Bardi f19b04d41b code formatting after mvn compile 2021-08-23 14:33:39 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Alessia Bardi 4c1474e693 Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data 2021-08-20 17:03:30 +02:00
Miriam Baglioni e5cf11d088 change open access route to result matching hbm to gold 2021-08-19 10:29:04 +02:00
Claudio Atzori f74adc4752 added DownloadCSV2 as alternative implementation of the same download procedure 2021-08-13 15:52:15 +02:00
Claudio Atzori 5f0903d50d fixed CSV downloader & tests 2021-08-13 14:17:54 +02:00
Claudio Atzori 17cefe6a97 [HBM] removed stale replace option 2021-08-13 12:43:59 +02:00
Claudio Atzori 7ee2757fcd fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy 2021-08-13 12:41:01 +02:00
Claudio Atzori c3ad4ab701 minor fixes 2021-08-13 12:23:15 +02:00
Claudio Atzori baed5e3337 test classes moved in specific components 2021-08-13 12:14:47 +02:00
Claudio Atzori 3359f73fcf cleanup & best practices 2021-08-13 12:00:42 +02:00
Miriam Baglioni f4ec81c92c mergin with branch beta 2021-08-13 10:31:35 +02:00
Miriam Baglioni 32fd75691f refactoring 2021-08-13 10:15:42 +02:00
Miriam Baglioni 01db1f8bc4 GetCSV refactoring - removed not needed import 2021-08-13 10:14:17 +02:00
Miriam Baglioni 964a46ca21 GetCSV refactoring - modified due to movement of classes 2021-08-13 10:11:18 +02:00
Claudio Atzori 9587d4aee8 Merge branch 'beta' into hostedbymap 2021-08-12 17:04:30 +02:00
Claudio Atzori 86d940044c added test to verify bad records from FWF-E-Book-Library 2021-08-12 11:32:56 +02:00
Claudio Atzori 8cdce59e0e [graph raw] let the mapping exceptions propagate 2021-08-12 11:32:26 +02:00
Miriam Baglioni 785db1d5b2 refactoring 2021-08-11 17:44:07 +02:00
Miriam Baglioni b966329833 reverting 2021-08-11 17:37:00 +02:00
Miriam Baglioni 8ad7c71417 reverting 2021-08-11 17:36:12 +02:00
Miriam Baglioni 0e1a6bec20 reverting 2021-08-11 17:32:29 +02:00
Miriam Baglioni c6a2a780a9 reverting 2021-08-11 17:30:17 +02:00
Miriam Baglioni 8229632839 adding assertions to the mapping of the unibi part of gold list 2021-08-11 16:36:01 +02:00
Miriam Baglioni b1c6140ebf removed all comments in Italian 2021-08-11 16:23:33 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 61d811ba53 suggestions from intellij 2021-08-11 12:18:20 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni b954fe9ba8 mergin with branch beta 2021-08-11 10:12:46 +02:00
Miriam Baglioni b688567db5 hostedbymap - modified part of test to check the bestaccessright changed 2021-08-11 10:12:10 +02:00
Miriam Baglioni 9731a6144a hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result 2021-08-10 17:49:45 +02:00
Miriam Baglioni a90bac3bc9 Graph Dump - added method to test class to verify addition of validation date in projects for community result 2021-08-09 16:36:54 +02:00
Miriam Baglioni bd0d7bfba7 Graph Dump - added resources for testing addition of validation date in project for communityresult 2021-08-09 16:36:17 +02:00
Miriam Baglioni 8daaa32e90 Graph Dump - added resources for testing 2021-08-09 15:46:29 +02:00
Miriam Baglioni bc9e3a06ba Graph Dump - extended the test class 2021-08-09 15:46:06 +02:00
Miriam Baglioni 2efa5abda5 refactoring 2021-08-09 12:28:36 +02:00
Miriam Baglioni eff499af9f added new tests and changed the test example 2021-08-09 11:12:30 +02:00
Miriam Baglioni c3931557e3 extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects 2021-08-06 18:56:18 +02:00
Miriam Baglioni 6bd1eca7e0 merge branch with beta 2021-08-05 15:23:32 +02:00