Commit Graph

2490 Commits

Author SHA1 Message Date
Claudio Atzori 93f7b7974f Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori 2facfefc19 updated maven repository URL 2020-11-13 15:38:40 +01:00
Claudio Atzori 528231a287 grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow 2020-11-13 15:37:48 +01:00
Enrico Ottonello 005f849674 added compression to output dataset 2020-11-13 12:45:31 +01:00
Enrico Ottonello 9a2fa9dc2f added test for other names parsing from summaries dump 2020-11-13 10:25:34 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 13e36a4da0 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:02 +01:00
Enrico Ottonello 13f28fa225 moved AuthorData to dhp-schemas; added other names to author data 2020-11-12 17:43:32 +01:00
Enrico Ottonello 2af21150c5 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-12 09:58:33 +01:00
Claudio Atzori 9b0fb9e958 merged from master 2020-11-12 09:27:12 +01:00
Claudio Atzori 75324ae58a Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-12 09:23:37 +01:00
Claudio Atzori 822971f54f no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup; 2020-11-12 09:22:59 +01:00
Enrico Ottonello 1f861f2b0d now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions) 2020-11-11 17:38:50 +01:00
Claudio Atzori 9841488482 Merge pull request 'latest changes in stats wf' (#54) from antonis.lempesis/dnet-hadoop:master into master
LGTM, thanks!
2020-11-11 16:01:51 +01:00
Antonis Lempesis 99ebaee347 fixed #5913 2020-11-11 16:56:46 +02:00
Claudio Atzori e3d3481fb9 Merge pull request 'organizations pids' (#53) from organization_pids into master
LGTM
2020-11-11 14:08:25 +01:00
Antonis Lempesis f14e65f6a3 reverted wrong change 2020-11-10 17:23:04 +02:00
Antonis Lempesis c02c7741c9 fixes in db creation 2020-11-10 17:11:30 +02:00
Antonis Lempesis e603fa5847 fixes in db creation 2020-11-10 17:11:12 +02:00
Enrico Ottonello fea2451658 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-10 11:49:43 +01:00
Claudio Atzori 18d9aad70c improved documentation in dhp-graph-provision 2020-11-10 11:48:55 +01:00
Enrico Ottonello 1513174d7e added further test case 2020-11-10 11:44:55 +01:00
Michele Artini 40160d171f organizations pids 2020-11-09 12:58:36 +01:00
Sandro La Bruzzo 8e1d43aab2 Implemented ID generation using IdentifierRecordFactory on DOIBoost 2020-11-09 11:53:55 +01:00
Sandro La Bruzzo 027ef2326c Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-06 17:12:42 +01:00
Sandro La Bruzzo cd27df91a1 fixed bug on missing relation in ANDS 2020-11-06 17:12:31 +01:00
Enrico Ottonello 6bc7dbeca7 first version of dataset successful generated from orcid dump 2020 2020-11-06 13:47:50 +01:00
Claudio Atzori d10447e747 re-packaged graph dump workflow sources 2020-11-05 17:38:18 +01:00
Claudio Atzori 2d76497488 cleanup 2020-11-05 17:10:24 +01:00
Claudio Atzori 144216fb88 Merge pull request 'OpenAIRE graph dump' (#51) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-05 17:09:52 +01:00
Miriam Baglioni f8e9bda24c merge branch with master 2020-11-05 16:31:18 +01:00
Miriam Baglioni afa0b1489b merge upstream 2020-11-05 16:31:09 +01:00
Miriam Baglioni 7ebdfacee9 removed commented code and added documentation to new method 2020-11-05 16:30:36 +01:00
Miriam Baglioni be5ed8f554 added check to avoid sending empty metadata. 2020-11-05 16:10:17 +01:00
Claudio Atzori 2148a51fae minor changes 2020-11-05 11:24:12 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Claudio Atzori f5f346dd2b Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni e9ac471ae9 removed dependency from classes for the pid graph dump 2020-11-04 18:04:42 +01:00
Miriam Baglioni f45c23316f removed entities added for the pid graph dump 2020-11-04 17:31:24 +01:00
Miriam Baglioni e9d948786d removed commented code 2020-11-04 17:30:51 +01:00
Miriam Baglioni b90a945c49 removed property files for pid graph dump 2020-11-04 17:28:33 +01:00
Miriam Baglioni bac307155a removed properties specific for pid graph dump 2020-11-04 17:28:04 +01:00
Miriam Baglioni 9c9d50f486 removed code specific for pid graph dump 2020-11-04 17:26:22 +01:00
Miriam Baglioni 5669890934 removed commented lines 2020-11-04 17:15:21 +01:00
Miriam Baglioni 6a89f59be9 removed commented lines 2020-11-04 17:13:59 +01:00
Miriam Baglioni 56150d7e5e removed all code related to the dump of pids graph 2020-11-04 17:13:12 +01:00
Miriam Baglioni 16c54a96f8 removed pid dump 2020-11-04 17:11:32 +01:00
Claudio Atzori e5da4ee9b1 dedup workflow using the common PidComparator 2020-11-04 15:02:02 +01:00
Miriam Baglioni d9d8de63cc merge upstream 2020-11-04 13:36:38 +01:00
Miriam Baglioni 0cac5436ff Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump 2020-11-04 13:21:11 +01:00