orcid-no-doi #43
|
@ -128,6 +128,8 @@ public class SparkGenEnrichedOrcidWorks {
|
||||||
})
|
})
|
||||||
.filter(p -> p != null);
|
.filter(p -> p != null);
|
||||||
|
|
||||||
|
sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress", "true");
|
||||||
|
|||||||
|
|
||||||
oafPublicationRDD
|
oafPublicationRDD
|
||||||
.mapToPair(
|
.mapToPair(
|
||||||
p -> new Tuple2<>(p.getClass().toString(),
|
p -> new Tuple2<>(p.getClass().toString(),
|
||||||
|
|
Loading…
Reference in New Issue
I just noticed the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a
SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text>
whereeu.dnetlib.dhp.schema.oaf.Publication
)eu.dnetlib.dhp.schema.action.AtomicAction
s, a simple wrapper class with just two fields: 1)Class<T> clazz
; and 2)T payload
; whereT extends Oaf
.