Commit Graph

239 Commits

Author SHA1 Message Date
Miriam Baglioni e47ea9349c extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also 2020-07-20 17:46:27 +02:00
Claudio Atzori 32f5e466e3 imports cleanup 2020-07-20 17:42:58 +02:00
Sandro La Bruzzo a7d3977481 added generation of EBI Dataset 2020-07-10 14:44:50 +02:00
Miriam Baglioni df80ae5c1b merge branch with fork master 2020-06-22 10:51:23 +02:00
Claudio Atzori 7d416f08d8 graph cleaning workflow: set hostedby to unknown repository when defined as NULL 2020-06-22 09:50:43 +02:00
Miriam Baglioni 65bf312360 merge branch with fork master 2020-06-18 11:35:27 +02:00
Miriam Baglioni 8211cbb9fe extension of Result to contain all the properties owned by any result type 2020-06-18 11:23:52 +02:00
Miriam Baglioni bc8611a95a added new resources for testing 2020-06-18 11:19:20 +02:00
Sandro La Bruzzo 9bf67f5de1 resolved conflicts 2020-06-17 09:15:43 +02:00
Sandro La Bruzzo 1d4275acc4 implemented first version of exportation of Scholexplorer into ActionSet 2020-06-17 09:10:38 +02:00
miconis 5233b15265 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-16 18:31:19 +02:00
miconis 11b77b9f4e json dumps for entity merge test modified to fit the new model. title merge adjusted to fix the error 2020-06-16 18:31:11 +02:00
Claudio Atzori 306669209f code formatting 2020-06-16 16:54:44 +02:00
Claudio Atzori 603b1bd0bb Merge branch 'master' into dhp_oaf_model 2020-06-16 15:43:59 +02:00
Miriam Baglioni 9dd3ef22c5 merge branch with fork master 2020-06-15 11:23:26 +02:00
Miriam Baglioni 56e70573c2 - 2020-06-15 11:06:56 +02:00
Miriam Baglioni 20b9e67728 added new class funder 2020-06-15 11:06:18 +02:00
Claudio Atzori 463489f59f code formatting 2020-06-12 12:03:25 +02:00
Claudio Atzori 4bcad1c9c3 Merge branch 'graph_cleaning' 2020-06-12 11:40:25 +02:00
Alessia Bardi ed8879ed8b deprecate PUBLICATION_DATASET 2020-06-12 10:55:56 +02:00
Alessia Bardi 3ade2631b3 Constants for new rels: citations and reviews 2020-06-12 10:52:12 +02:00
Claudio Atzori ba8a024af9 avoid NPEs merging titles 2020-06-12 10:45:11 +02:00
Claudio Atzori a2fdf85ba1 WIP: graph cleaner implementation 2020-06-09 19:52:53 +02:00
Miriam Baglioni 206abba48c merge branch with fork master 2020-06-09 15:41:14 +02:00
Miriam Baglioni 5121cbaf6a new classes for external dump. Only classes functional to dump products 2020-06-09 15:37:46 +02:00
Miriam Baglioni f232db84e9 new classes for external dump. Only classes functional to dump products 2020-06-08 15:11:37 +02:00
Claudio Atzori 25a093b1a4 integrated changes from master 2020-06-08 15:04:00 +02:00
Claudio Atzori 45973b5743 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-08 15:01:34 +02:00
Claudio Atzori 94533b71bc added comments for model fields removal 2020-06-08 15:01:21 +02:00
Claudio Atzori b2f9564f13 WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob 2020-05-29 10:58:15 +02:00
Sandro La Bruzzo b87b3ddb6b changed mapping ORCIDToOAF 2020-05-29 09:32:04 +02:00
Sandro La Bruzzo 7d29b61c62 code refactor 2020-05-28 09:57:46 +02:00
Miriam Baglioni dd1e0b93b8 added merge for Programme 2020-05-27 17:40:32 +02:00
Miriam Baglioni f3dcca0dd0 added equals for programme 2020-05-27 17:23:34 +02:00
Miriam Baglioni 92e3a52e91 merge branch with fork master 2020-05-26 15:57:51 +02:00
Claudio Atzori 7b288a94cb code formatting 2020-05-26 09:54:13 +02:00
Claudio Atzori ae04234472 DataInfo.deletedbyinference is false by default 2020-05-25 19:32:48 +02:00
miconis da1e5cf557 implementation of the result title merge. main title with higher trust, distinct between the others 2020-05-25 18:02:57 +02:00
Claudio Atzori 4b34872b44 using Objects.equals to check Field<T> equivalence 2020-05-25 10:14:15 +02:00
Claudio Atzori 0ab0206b4d removed null objects from flattened Field<T> in mergeLists 2020-05-25 10:11:41 +02:00
Claudio Atzori de108f54d6 code formatting 2020-05-23 10:21:19 +02:00
Claudio Atzori 6b56cae57d added mapping for bestaccessrights 2020-05-23 09:57:39 +02:00
Miriam Baglioni 24daa1deaa added to the Project class a new field that is the list of programmes 2020-05-20 10:28:16 +02:00
Miriam Baglioni d323100af0 added the new Programme POJO. It contains the code and the description of the programme 2020-05-20 10:27:27 +02:00
Miriam Baglioni 22cb9e0da7 simple code to get file from URL 2020-05-15 18:18:01 +02:00
Miriam Baglioni 3aaad753fd Merge branch 'master' into dhp_oaf_model 2020-05-15 15:55:23 +02:00
Claudio Atzori b7e198475a added common methods to create HiveDB table identifiers 2020-05-15 10:20:07 +02:00
Miriam Baglioni 42085e8d99 added some constants 2020-05-14 18:22:28 +02:00
Claudio Atzori c6b028f2af code formatting 2020-05-11 17:38:08 +02:00
Claudio Atzori 637653cba3 integrated changes from master 2020-05-11 14:05:25 +02:00
Miriam Baglioni 871e079b45 merged with master 2020-05-11 10:20:00 +02:00
Miriam Baglioni 391b2399cc merge upstream 2020-05-11 10:08:51 +02:00
Claudio Atzori 42f1a2bf94 bumped project version to 1.2.0-SNAPSHOT 2020-05-11 10:05:57 +02:00
Miriam Baglioni 32301451ec merge upstream 2020-05-11 09:42:23 +02:00
Miriam Baglioni 28556507e7 - 2020-05-08 12:54:52 +02:00
Miriam Baglioni 4c94231cad merge with master fork 2020-05-08 12:25:57 +02:00
Claudio Atzori 62ea19f1d3 introduced mapping for ExternalReferences, made urls defined within an instance unique 2020-05-08 09:43:26 +02:00
Miriam Baglioni 182225becb Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop 2020-05-07 11:38:17 +02:00
Miriam Baglioni 5efae3acb9 new workflow for job3 2020-05-07 11:38:10 +02:00
Claudio Atzori 128c3bf1c8 restored Author bean with simple getter/setter, author pid addition moved into dedicated implementation SparkOrcidToResultFromSemRelJob3 2020-05-07 11:14:56 +02:00
Claudio Atzori 17860d3ab6 general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas) 2020-05-06 13:20:02 +02:00
Claudio Atzori 405f495d54 code formatting 2020-05-04 19:18:12 +02:00
Claudio Atzori 11938dac5e this commit adds: validated/validationDate to relationships; measure type and simple unit test to indicate the relative serialization 2020-05-04 16:47:07 +02:00
Claudio Atzori 24d8d097b6 sync with master branch 2020-05-04 16:44:13 +02:00
Claudio Atzori de5fbe325c bits of javadoc 2020-05-04 16:00:48 +02:00
Miriam Baglioni 4b0bd91012 - 2020-04-30 12:45:28 +02:00
Miriam Baglioni 3abb76ff7a merge with upstream 2020-04-30 11:15:54 +02:00
Miriam Baglioni 638a3c465b - 2020-04-30 11:05:17 +02:00
Miriam Baglioni 564e5d6279 added new information in support of blacklist reader 2020-04-30 10:22:58 +02:00
Claudio Atzori 439c6255a2 cleanup 2020-04-29 19:09:07 +02:00
Miriam Baglioni 869f576273 added hash map for relationship entityType id prefix, and relation inverse 2020-04-29 18:14:52 +02:00
Miriam Baglioni b85ad7012a reads the blacklist from the blacklist db and writes it as a set of relations on hdfs 2020-04-29 17:29:49 +02:00
Miriam Baglioni f7695e833c resolved conflicts 2020-04-29 11:41:31 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Miriam Baglioni 5dccbe13db merge with upstream 2020-04-27 10:43:59 +02:00
Claudio Atzori 268462623a refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity 2020-04-24 14:42:01 +02:00
Claudio Atzori 5100527400 added default value for resulttype field 2020-04-23 19:14:37 +02:00
Miriam Baglioni 04fc223346 add method addPid 2020-04-23 11:07:44 +02:00
Miriam Baglioni 259525cb93 Merge remote-tracking branch 'upstream/master' 2020-04-21 18:33:46 +02:00
Claudio Atzori d772d967aa restored changes from master branch 2020-04-20 18:53:06 +02:00
miconis 4da13e4570 Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167, reversing
changes made to 5f45f2c77f.
2020-04-20 16:04:49 +02:00
Claudio Atzori d714bfb4d4 collectedfrom field moved in common parent class Oaf.java 2020-04-20 12:25:19 +02:00
Miriam Baglioni 454b8a6a29 Merge remote-tracking branch 'upstream/master' 2020-04-18 14:09:44 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
Miriam Baglioni 7d9fd75020 add method addPid 2020-04-17 17:13:48 +02:00
Sandro La Bruzzo 5e2fa996aa fixed problem with conversion of long into string 2020-04-17 12:11:51 +02:00
Sandro La Bruzzo c36239e693 fixed incremental indexing 2020-04-14 17:47:36 +02:00
Claudio Atzori cc67dbff81 typo in text 2020-04-14 17:11:55 +02:00
Claudio Atzori 8b2043c7b1 introducing List<KeyValue> generic container for Relation specific properties. Ref ticket https://issue.openaire.research-infrastructures.eu/issues/5512 2020-04-14 16:43:40 +02:00
Claudio Atzori d74e128aa6 Utility classes moved in dhp-common and dhp-schemas 2020-04-07 11:56:22 +02:00
Claudio Atzori c57cf679ca Merge branch 'provision_dataset' 2020-04-07 08:56:58 +02:00
Claudio Atzori 3d1b637cab dataset based provision WIP 2020-04-04 14:03:43 +02:00
Przemysław Jacewicz 51ff3b4e81 [dhp-schemas] added safeguard against casting exception in mergeFrom methods and null-safe handling of collectedfrom collection for relation 2020-04-01 18:28:23 +02:00
przemek 9d1d18d4b9 Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype 2020-03-31 12:04:58 +02:00
Sandro La Bruzzo 0cd022ad6a merge with master 2020-03-26 14:08:29 +01:00
przemek 638b78f96a Merge remote-tracking branch 'origin/master' into przemyslawjacewicz_actionmanager_impl_prototype 2020-03-19 15:12:56 +01:00
Claudio Atzori 1850a02ae4 added simpler, AtomicAction replacement, based on the dhp.Oaf model 2020-03-19 10:44:16 +01:00
Claudio Atzori 23a929177d updates to the graph require this to be an actual class 2020-03-13 14:56:35 +01:00
Sandro La Bruzzo addaaa091f migrate relation from RDD to Dataset 2020-03-13 09:13:20 +01:00
Przemysław Jacewicz f7454a9ed8 Added equals and hashCode for OAF types 2020-03-11 16:57:28 +01:00
Michele Artini 4c94e74a84 Added a missing dependency 2020-02-20 11:43:32 +01:00
Claudio Atzori d42dde52ba implemented method to merge relations 2020-02-19 17:29:05 +01:00
Claudio Atzori 11cfd6bd9a integrated changes from master branch 2020-02-13 17:27:07 +01:00
Claudio Atzori bbf1b611b9 refereed, processingchargeamount and processingchargecurrency moved inside the Instance element. Introduced specific type to model Result's countries 2020-02-13 17:21:11 +01:00
Claudio Atzori d3b96f102b builder pattern screws up the Parquet schema inference method, avoid using it in the bean definitions 2020-02-04 14:10:58 +01:00
Claudio Atzori ed290ca8d7 builder pattern 2020-02-03 10:35:51 +01:00
Claudio Atzori 1ecca69f49 added annotation to ignore method during the serialization 2020-01-30 17:45:28 +01:00
Sandro La Bruzzo 19a80e4638 implemented workfow for aggregation and generation of infospace graph 2020-01-24 09:58:55 +01:00
Claudio Atzori 799929c1e3 joining entities using T x R x S method with groupByKey 2020-01-21 16:35:44 +01:00
Sandro La Bruzzo fa7504bf29 removed DLI stuff should be in a branch 2020-01-20 10:28:00 +01:00
Claudio Atzori 749b0660ab instance URLs must be repeatable 2020-01-17 14:22:15 +01:00
Sandro La Bruzzo b4392f9f43 implemented DedupRecord factory for missing entities 2019-12-13 09:40:02 +01:00
miconis 545e940007 implementation of the mergeFrom for the Datasources 2019-12-12 15:36:41 +01:00
Sandro La Bruzzo 39367676d7 implemented DedupRecord factory with the merge of project 2019-12-12 15:18:48 +01:00
Sandro La Bruzzo 6b45e37e22 implemented DedupRecord factory with the merge of organizations 2019-12-11 16:57:37 +01:00
Sandro La Bruzzo abd9034da0 implemented DedupRecord factory with the merge of publications 2019-12-11 15:43:24 +01:00
miconis 4b66b471a4 implementation of the sorting by trust mechanism and the merge of oaf entities 2019-12-10 14:57:16 +01:00
Claudio Atzori 6a7bee5e43 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2019-11-14 15:43:07 +01:00
Claudio Atzori 0c4b316f82 align Result model with the latest OpenAIRE schema changes introduced in the protobuf model 2019-11-14 15:42:52 +01:00
Sandro La Bruzzo aad0cb40b7 Added schema Scholexplorer 2019-11-14 10:34:09 +01:00
Claudio Atzori 2243089b78 Author PIDs include also provenance information 2019-11-07 17:38:37 +01:00
Claudio Atzori 32ed4ae8d6 conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies 2019-11-04 12:28:56 +01:00
Sandro La Bruzzo 9ee4e5a196 remove a bit of syntactic sugar on the object inheritance :( 2019-10-25 18:10:30 +02:00
miconis 4908165e05 implementation of the createPublication method to map publications 2019-10-25 11:54:14 +02:00
Claudio Atzori 4eaff36ea6 a bit of syntactic sugar on the object inheritance 2019-10-25 10:55:35 +02:00
Claudio Atzori b0aa7cd7fb fluent setters 2019-10-25 09:53:08 +02:00
Claudio Atzori 4b331790e7 resolved conflicts 2019-10-25 09:45:12 +02:00
Claudio Atzori c929c1dfac more proto 2 graph model mappings 2019-10-25 09:25:36 +02:00
Sandro La Bruzzo 09ffda03a2 removed circular dependencies 2019-10-25 09:24:18 +02:00
Claudio Atzori d46371ceab Merge branch 'master' of https://code-repo.d2science.org/D-Net/dnet-hadoop 2019-10-24 17:43:55 +02:00
Claudio Atzori 0d88f9a6a4 added mapping for projects 2019-10-24 17:43:42 +02:00
Sandro La Bruzzo 2dd9572f41 added Mapping of OriginalDescription 2019-10-24 17:36:44 +02:00
Sandro La Bruzzo 6c32d418ac added conversion of ExtraInfo 2019-10-24 17:26:55 +02:00
Claudio Atzori 52abfcfac7 Field<T> is an actual class, fluent setters 2019-10-24 17:17:12 +02:00
Claudio Atzori d8bfaa3687 added mapping for relations 2019-10-24 17:04:13 +02:00
Claudio Atzori d38aeb8c6e DataInfo.provenanceaction not repeatable, fluent setters 2019-10-24 16:55:38 +02:00
Sandro La Bruzzo 25a62b79e5 added new model for information space dataframes 2019-10-24 11:39:41 +02:00