dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Claudio Atzori	fbdd18a96b	using dataset based relation preparation procedure	2020-04-07 08:54:39 +02:00
Claudio Atzori	77f59b1b10	dataset based provision WIP	2020-04-06 19:37:27 +02:00
Claudio Atzori	e355961997	dataset based provision WIP	2020-04-06 17:34:25 +02:00
Claudio Atzori	ca345aaad3	dataset based provision WIP	2020-04-06 15:33:31 +02:00
Claudio Atzori	c8f4b95464	dataset based provision WIP	2020-04-06 08:59:58 +02:00
Claudio Atzori	eb2f5f3198	dataset based provision WIP	2020-04-04 17:41:31 +02:00
Claudio Atzori	3d1b637cab	dataset based provision WIP	2020-04-04 14:03:43 +02:00
Claudio Atzori	24b2c9012e	dataset based provision WIP	2020-04-02 18:44:09 +02:00
Claudio Atzori	daa26acc9d	dataset based provision WIP, fixed spark2EventLogDir	2020-04-02 16:15:50 +02:00
Claudio Atzori	9c7092416a	dataset based provision WIP	2020-04-01 19:07:30 +02:00
Claudio Atzori	1402eb1fe7	cleanup	2020-04-01 15:38:50 +02:00
Claudio Atzori	adcdd2d05e	WIP: reimplementing the adjacency list construction process using spark Datasets	2020-04-01 14:56:57 +02:00
Claudio Atzori	0fbec69b82	use oozie prepare statement to cleanup working directories	2020-03-30 19:48:41 +02:00
Claudio Atzori	f3f9affd49	allow dynamic executors to build XML records	2020-03-30 13:12:11 +02:00
Claudio Atzori	2e2d4c4c68	adjusted path to template resource	2020-03-30 13:11:49 +02:00
Claudio Atzori	673e744649	moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa	2020-03-27 10:42:17 +01:00
Michele Artini	ebe45003d9	fixed some junit packages	2020-03-25 16:45:03 +01:00
Claudio Atzori	abe8fb69a2	added global properties, moved postprocessing script inside the oozie_app directory	2020-03-18 15:43:54 +01:00
Claudio Atzori	aeb01fa353	reading from newline delimited json textfiles instead of sequence files	2020-03-17 11:57:24 +01:00
Claudio Atzori	a3f184fd3f	added field websiteurl in related organizations	2020-03-10 17:08:58 +01:00
Claudio Atzori	0e95544495	fixed serialization for datasource subjects	2020-03-10 17:07:44 +01:00
Claudio Atzori	5e342a555c	no need to compute the inverse relClass, fixed text() in xpath expressions	2020-03-05 12:51:48 +01:00
Claudio Atzori	6ec04d4e02	specified column used to perform the join operation in the javadoc	2020-03-05 12:50:38 +01:00
Claudio Atzori	1e563bc15e	introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase	2020-03-04 10:55:11 +01:00
Claudio Atzori	bc7cfd5975	indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure	2020-03-02 17:03:07 +01:00
Claudio Atzori	60bc2b1a20	drop the hive DB before populating it from scratch	2020-02-27 10:10:55 +01:00
Claudio Atzori	6a73fd5da5	in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built	2020-02-21 09:17:19 +01:00
Claudio Atzori	33185fd0b7	ISLookupClientFactory moved in dhp-common	2020-02-19 16:56:38 +01:00
Claudio Atzori	ed76521d9b	removed stale test resources, will be re-added later on	2020-02-18 11:51:08 +01:00
Claudio Atzori	0f364605ff	removed stale tests, need to reimplemente them anyway	2020-02-18 11:48:19 +01:00
Claudio Atzori	56d1810a66	working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr	2020-02-14 12:28:52 +01:00
Claudio Atzori	1fee6e2b7e	implemented XML records construction and serialization, indexing WIP	2020-02-13 16:53:27 +01:00
Claudio Atzori	7ba0f44d05	WIP	2020-01-30 18:21:07 +01:00
Claudio Atzori	49ef2f4eb1	removed input parameter specification, SparkXmlRecordBuilderJob doesn't need hive	2020-01-30 18:20:26 +01:00
Claudio Atzori	b5e1e2e5b2	reintegrated changes from `fcbc4ccd70`	2020-01-30 18:11:04 +01:00
Claudio Atzori	7bacd6812e	Merge branch 'provision_indexing' of https://code-repo.d4science.org/D-Net/dnet-hadoop into HEAD Conflicts: dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/GraphJoiner.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/MappingUtils.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/RelatedEntity.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/SparkXmlRecordBuilderJob.java	2020-01-30 17:59:46 +01:00
Claudio Atzori	b2691a3b0a	save adjacency list as JoinedEntity	2020-01-30 17:46:29 +01:00
Claudio Atzori	8c2aff99b0	joining entities using T x R x S, WIP: last representation based on LinkedEntity type	2020-01-29 15:40:33 +01:00
Claudio Atzori	fcbc4ccd70	a bit of docs doesn't hurt	2020-01-24 08:43:23 +01:00
Claudio Atzori	a55f5fecc6	joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint	2020-01-24 08:17:53 +01:00
Claudio Atzori	799929c1e3	joining entities using T x R x S method with groupByKey	2020-01-21 16:35:44 +01:00
Claudio Atzori	97c239ee0d	WIP: trying to find a way to build the records for the index	2020-01-16 12:02:28 +02:00
Claudio Atzori	7ba586d2e5	oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed	2019-12-17 16:24:49 +01:00

... 2 3 4 5 6

293 Commits