Non necessary javaRdd conversion? #163

Open
opened 2021-11-18 17:22:32 +01:00 by claudio.atzori · 0 comments

a24b9f8268/dhp-workflows/dhp-dedup-openaire/src/main/java/eu/dnetlib/dhp/oa/dedup/SparkCopyRelationsNoOpenorgs.java (L63)

Why is it necessary to convert to a javaRdd? Wouldn't be more straightforward defined as

spark
	.read()
	.textFile(relationPath)
	.map(patchRelFn(), Encoders.bean(Relation.class))
	.filter((FilterFunction<Relation>) x -> !isOpenorgs(x))
	.write()
	.mode(SaveMode.Overwrite)
	.json(outputPath);
https://code-repo.d4science.org/D-Net/dnet-hadoop/src/commit/a24b9f8268a3e36a5032d40a103793677ab1dd8b/dhp-workflows/dhp-dedup-openaire/src/main/java/eu/dnetlib/dhp/oa/dedup/SparkCopyRelationsNoOpenorgs.java#L63 Why is it necessary to convert to a javaRdd? Wouldn't be more straightforward defined as ``` spark .read() .textFile(relationPath) .map(patchRelFn(), Encoders.bean(Relation.class)) .filter((FilterFunction<Relation>) x -> !isOpenorgs(x)) .write() .mode(SaveMode.Overwrite) .json(outputPath); ```
michele.debonis was assigned by claudio.atzori 2021-11-18 17:22:32 +01:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#163
No description provided.