BrBETA_dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Miriam Baglioni	16c54a96f8	removed pid dump	2020-11-04 17:11:32 +01:00
Miriam Baglioni	0cac5436ff	Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump	2020-11-04 13:21:11 +01:00
Alessia Bardi	51808b5afd	Updated descriptions	2020-11-04 12:29:48 +01:00
Alessia Bardi	e6becf8659	Updated descriptions	2020-11-04 12:17:57 +01:00
Alessia Bardi	0abe0eee33	Updated descriptions	2020-11-04 12:15:30 +01:00
Alessia Bardi	f6ab238f5d	Updated descriptions	2020-11-04 11:50:47 +01:00
Miriam Baglioni	c010a8442f	fixed issue on test code	2020-11-03 17:26:51 +01:00
Miriam Baglioni	8ec7a61188	merge branch with master	2020-11-03 16:59:08 +01:00
Miriam Baglioni	c209284ca7	new schemas for the entities in the dump with added descriptions	2020-11-03 16:58:08 +01:00
Miriam Baglioni	08806deddf	added the splitSize non mandatory parameter. Default size 10G	2020-11-03 16:57:34 +01:00
Miriam Baglioni	7d2eda43ca	added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase	2020-11-03 16:57:01 +01:00
Miriam Baglioni	cbbb1bdc54	moved business logic to new class in common for handling the zip of hte archives	2020-11-03 16:55:50 +01:00
Miriam Baglioni	d4382b54df	moved the tar archive with maz size on common module	2020-11-03 16:54:50 +01:00
Claudio Atzori	5310e56dba	remove empy PIDs	2020-11-03 11:52:10 +01:00
Sandro La Bruzzo	754c86f33e	fixed test to work on jenkins	2020-11-02 09:35:01 +01:00
Miriam Baglioni	dabb33e018	changed the discriminant for which split the file	2020-10-30 17:52:22 +01:00
Miriam Baglioni	0fba08eae4	max allowed size per file 10 Gb	2020-10-30 16:05:55 +01:00
Miriam Baglioni	b828587252	prevent the code to cicle indefinetly	2020-10-30 15:01:25 +01:00
Miriam Baglioni	f747e303ac	classes for dumping of the graph as ttl file	2020-10-30 14:13:45 +01:00
Miriam Baglioni	16baf5b69e	formatting	2020-10-30 14:13:14 +01:00
Miriam Baglioni	a9eef9c852	added check for possible Optional value in relation dataInfo	2020-10-30 14:12:28 +01:00
Miriam Baglioni	5f4de9a962	formatting	2020-10-30 14:11:40 +01:00
Miriam Baglioni	14bf2e7238	added option to split dumps bigger that 40Gb on different files	2020-10-30 14:09:04 +01:00
Miriam Baglioni	78fdb11c3f	merge branch with master	2020-10-29 12:55:22 +01:00
Sandro La Bruzzo	1d9fdb7367	fixed spark memory issue in SparkSplitOafTODLIEntities	2020-10-28 12:30:32 +01:00
Miriam Baglioni	d2374e3b9e	added code to handle cases where the funding tree is not existing	2020-10-27 16:15:21 +01:00
Miriam Baglioni	5d3012eeb4	changed code to dump only the programme list and not the classification list	2020-10-27 16:14:18 +01:00
Miriam Baglioni	3241ec1777	added connection timeout and socket timeout 600 sec	2020-10-27 16:12:11 +01:00
Alessia Bardi	1425d810a8	testing mapping	2020-10-19 17:46:14 +02:00
Sandro La Bruzzo	fed711da80	Merge remote-tracking branch 'origin/master' into merge_record_to_common	2020-10-13 15:32:45 +02:00
Alessia Bardi	8775a64bc1	Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47 ) from merge_graph into master	2020-10-09 14:44:52 +02:00
Claudio Atzori	e751c1402f	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop	2020-10-09 13:53:21 +02:00
Claudio Atzori	b961dc7d1e	added originalid to the fields in the result graph view	2020-10-09 13:53:15 +02:00
Sandro La Bruzzo	eec418cd26	moved AuthoreMerger into dhp-common	2020-10-08 10:33:55 +02:00
Sandro La Bruzzo	fe0a7870e6	Added test to check if merge authors works	2020-10-08 10:33:12 +02:00
Sandro La Bruzzo	cd9c377d18	adpted scholexplorer Dump generation to the new Dataset definition	2020-10-08 10:10:13 +02:00
Claudio Atzori	a3f37a9414	javadoc	2020-10-07 16:44:22 +02:00
Claudio Atzori	8d85a2fced	[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two	2020-10-07 16:28:52 +02:00
Miriam Baglioni	ae08b3c0dd	merge branch with master	2020-10-05 11:35:55 +02:00
Miriam Baglioni	11b7eaae09	changed the name of the folder where to store the context entity from context to communities_infrastructures	2020-10-05 11:24:54 +02:00
Miriam Baglioni	32bffb0134	changed the name from communities_infrastructures to communities_infrastuctures.json	2020-10-05 11:24:17 +02:00
Miriam Baglioni	25cbcf6114	changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar	2020-10-02 12:17:46 +02:00
Claudio Atzori	49ae3450a9	code formatting	2020-10-02 09:43:24 +02:00
Claudio Atzori	c2a6e2a9bf	fixed mapping for datasource journal info (ISSNs)	2020-10-02 09:37:08 +02:00
Miriam Baglioni	01117a46e1	whole workflow activated	2020-10-01 17:19:21 +02:00
Miriam Baglioni	cfb5766c6b	removed double json.gz from names of files in the tar	2020-10-01 17:18:34 +02:00
Miriam Baglioni	fcaedac980	merge branch with master	2020-10-01 16:46:59 +02:00
Miriam Baglioni	c6e6ed1bd8	merge branch with master	2020-10-01 16:24:41 +02:00
Claudio Atzori	2e9e13444d	author pids made unique by value	2020-10-01 12:50:40 +02:00
Claudio Atzori	e265c3e125	cleaning functions factored out in a dedicated class	2020-10-01 10:50:15 +02:00

1 2 3 4 5 ...

594 Commits