dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Claudio Atzori	6cb0a9bff0	dedup wf directory structure aligned with project commons	2020-03-20 16:48:14 +01:00
Claudio Atzori	abe8fb69a2	added global properties, moved postprocessing script inside the oozie_app directory	2020-03-18 15:43:54 +01:00
Claudio Atzori	aeb01fa353	reading from newline delimited json textfiles instead of sequence files	2020-03-17 11:57:24 +01:00
Claudio Atzori	a3f184fd3f	added field websiteurl in related organizations	2020-03-10 17:08:58 +01:00
Claudio Atzori	0e95544495	fixed serialization for datasource subjects	2020-03-10 17:07:44 +01:00
Claudio Atzori	5e342a555c	no need to compute the inverse relClass, fixed text() in xpath expressions	2020-03-05 12:51:48 +01:00
Claudio Atzori	6ec04d4e02	specified column used to perform the join operation in the javadoc	2020-03-05 12:50:38 +01:00
Claudio Atzori	1e563bc15e	introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase	2020-03-04 10:55:11 +01:00
Claudio Atzori	bc7cfd5975	indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure	2020-03-02 17:03:07 +01:00
Claudio Atzori	60bc2b1a20	drop the hive DB before populating it from scratch	2020-02-27 10:10:55 +01:00
Claudio Atzori	6a73fd5da5	in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built	2020-02-21 09:17:19 +01:00
Claudio Atzori	33185fd0b7	ISLookupClientFactory moved in dhp-common	2020-02-19 16:56:38 +01:00
Claudio Atzori	ed76521d9b	removed stale test resources, will be re-added later on	2020-02-18 11:51:08 +01:00
Claudio Atzori	0f364605ff	removed stale tests, need to reimplemente them anyway	2020-02-18 11:48:19 +01:00
Claudio Atzori	56d1810a66	working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr	2020-02-14 12:28:52 +01:00
Claudio Atzori	1fee6e2b7e	implemented XML records construction and serialization, indexing WIP	2020-02-13 16:53:27 +01:00
Claudio Atzori	7ba0f44d05	WIP	2020-01-30 18:21:07 +01:00
Claudio Atzori	49ef2f4eb1	removed input parameter specification, SparkXmlRecordBuilderJob doesn't need hive	2020-01-30 18:20:26 +01:00
Claudio Atzori	b5e1e2e5b2	reintegrated changes from `fcbc4ccd70`	2020-01-30 18:11:04 +01:00
Claudio Atzori	7bacd6812e	Merge branch 'provision_indexing' of https://code-repo.d4science.org/D-Net/dnet-hadoop into HEAD Conflicts: dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/GraphJoiner.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/MappingUtils.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/RelatedEntity.java dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/SparkXmlRecordBuilderJob.java	2020-01-30 17:59:46 +01:00
Claudio Atzori	b2691a3b0a	save adjacency list as JoinedEntity	2020-01-30 17:46:29 +01:00
Claudio Atzori	8c2aff99b0	joining entities using T x R x S, WIP: last representation based on LinkedEntity type	2020-01-29 15:40:33 +01:00
Claudio Atzori	fcbc4ccd70	a bit of docs doesn't hurt	2020-01-24 08:43:23 +01:00
Claudio Atzori	a55f5fecc6	joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint	2020-01-24 08:17:53 +01:00
Claudio Atzori	799929c1e3	joining entities using T x R x S method with groupByKey	2020-01-21 16:35:44 +01:00
Claudio Atzori	97c239ee0d	WIP: trying to find a way to build the records for the index	2020-01-16 12:02:28 +02:00
Claudio Atzori	7ba586d2e5	oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed	2019-12-17 16:24:49 +01:00

1 2

77 Commits