dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Antonis Lempesis	dd3d6a6e15	compute stats for the used and new impala tables	2020-07-24 19:50:40 +03:00
Antonis Lempesis	e6f50de6ef	Separated impala from hive steps	2020-07-24 19:50:40 +03:00
Antonis Lempesis	de49173420	fixed a typo in queries	2020-07-24 19:50:40 +03:00
antleb	391cf80fb8	Added peer-reviewed, green, gold tables and fields in result. Added shortcuts from result-country	2020-07-24 19:50:40 +03:00
antleb	68389d0125	Corrected the script used by the last step of the wf	2020-07-24 19:50:40 +03:00
antleb	ec52141f1a	changed refereed type from value to clssname	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	63cd797aba	Comment out step 15 to make it work with the new schema of Claudio	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	138c6ddffa	Insert statement to datasource table that takes into account the piwik_id of the openAIRE graph	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	3630794cef	Fix to consider the relationships that have been 'virtually deleted' for project_results - defect #5607	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	5546f29e63	Corrections on the shadow schema and the impala table stats calculation	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	adf8a025d2	Adding more relations (Sources, Licences, Additional) and shadow schema as provided and discussed with Antonis Lempesis	2020-07-24 19:50:40 +03:00
Spyros Zoupanos	657a40536b	Corrections by Spyros: Scipt cleanup, corrections and re-arrangement	2020-07-24 19:50:40 +03:00
Giorgos Alexiou	477fa6234d	Script re-organisation and adding table invalidations needed for impala	2020-07-24 19:50:40 +03:00
Claudio Atzori	56bbfdc65d	introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'	2020-07-23 08:54:10 +02:00
Sandro La Bruzzo	9ab594ccf6	fixed test	2020-07-21 10:36:21 +02:00
Claudio Atzori	ebf60020ac	map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type	2020-07-20 19:01:10 +02:00
Claudio Atzori	32f5e466e3	imports cleanup	2020-07-20 17:42:58 +02:00
Claudio Atzori	54ac583923	code formatting	2020-07-20 17:37:08 +02:00
Claudio Atzori	124e7ce19c	in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available	2020-07-20 17:33:37 +02:00
Claudio Atzori	050dda223d	Merge pull request 'removed duplicated fields' (#25 ) from unique_field_in_lists into master Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists. The task to update the model in such a way is added on #9#issuecomment-1583 Thanks!	2020-07-20 12:12:50 +02:00
Claudio Atzori	e0c4cf6f7b	added parameter to drive the graph merge strategy: priority (BETA\|PROD)	2020-07-20 10:48:01 +02:00
Claudio Atzori	94ccdb4852	Merge branch 'master' into merge_graph	2020-07-20 10:14:55 +02:00
Claudio Atzori	0937c9998f	Merge branch 'deduptesting'	2020-07-20 10:00:20 +02:00
Claudio Atzori	105176105c	updated dnet-pace-core dependency to version 4.0.4 to include the latest clustering function	2020-07-20 09:59:47 +02:00
Claudio Atzori	de72b1c859	cleanup	2020-07-20 09:59:11 +02:00
Michele Artini	331a3cbdd0	fixed originalId	2020-07-20 09:50:29 +02:00
Michele Artini	c59c5369b1	Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop	2020-07-18 09:40:54 +02:00
Michele Artini	346a1d2b5a	update eventId generator	2020-07-18 09:40:36 +02:00
Sandro La Bruzzo	9116d75b3e	Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop	2020-07-17 18:01:30 +02:00
Miriam Baglioni	47c7122773	changed priority from beta to production	2020-07-17 12:56:35 +02:00
Michele Artini	442f30930c	removed duplicated fields	2020-07-17 12:25:36 +02:00
Claudio Atzori	1781609508	code formatting	2020-07-16 19:06:56 +02:00
Claudio Atzori	db8b90a156	renamed CORE -> BETA	2020-07-16 19:05:13 +02:00
Claudio Atzori	878f2b931c	Merge branch 'master' into merge_graph	2020-07-16 16:34:24 +02:00
Claudio Atzori	cc5d13da85	introduced parameter shouldIndex (true\|false)	2020-07-16 13:46:39 +02:00
Claudio Atzori	b098cc3cbe	avoid repeating identical values for fields: source, description	2020-07-16 13:45:53 +02:00
Claudio Atzori	805de4eca1	fix: filter the blocks with size = 1	2020-07-16 10:11:32 +02:00
Claudio Atzori	4b9fb2ffb8	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop	2020-07-15 11:26:04 +02:00
Claudio Atzori	5033c25587	code formatting	2020-07-15 11:26:00 +02:00
Claudio Atzori	b90389bac4	code formatting	2020-07-15 11:24:48 +02:00
Claudio Atzori	4e6f46e8fa	filter blocks with one record only	2020-07-15 11:22:20 +02:00
Michele Artini	262c29463e	relations with multiple datasources	2020-07-15 09:18:40 +02:00
Claudio Atzori	7d6e269b40	reverted CreateRelatedEntitiesJob_phase1 to its previous state	2020-07-13 22:54:04 +02:00
Claudio Atzori	8e97598eb4	avoid to NPE in case of null instances	2020-07-13 20:46:14 +02:00
Claudio Atzori	06def0c0cb	SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter	2020-07-13 20:09:06 +02:00
miconis	b52c246aed	merge done	2020-07-13 19:57:02 +02:00
miconis	b8a45041fd	minor changes	2020-07-13 19:53:18 +02:00
Claudio Atzori	66f9f6d323	adjusted parameters for the dedup stats workflow	2020-07-13 19:26:46 +02:00
miconis	03ecfa5ebd	implementation of the test class for the new block stats spark action	2020-07-13 18:48:23 +02:00
miconis	10e08ccf45	Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop	2020-07-13 18:22:45 +02:00

1 2 3 4 5 ...

1388 Commits All Branches Search

1388 Commits

All Branches