orcid-no-doi #43

enrico.ottonello · 2020-09-15T16:15:30+02:00

enrico.ottonello commented

2020-09-15 16:15:30 +02:00

orcid publications no doi dataset generation from dump files

claudio.atzori requested changes 2020-09-21 16:25:25 +02:00

claudio.atzori left a comment

In this review I checked basic code practices like

exception propagation
management of side cases (methods returning null)

Please indicate the rationale behind your choices and apply the requested changes.

In this review I checked basic code practices like - exception propagation - management of side cases (methods returning null) Please indicate the rationale behind your choices and apply the requested changes.

dhp-workflows/dhp-doiboost/pom.xml Outdated

						
				@ -87,0 +87,4 @@

				        <dependency>

				            <groupId>org.apache.commons</groupId>

				            <artifactId>commons-text</artifactId>

				            <version>1.8</version>

claudio.atzori commented

2020-09-16 09:56:50 +02:00

Versions of dependencies should be only declared in the main pom file. Please declare this dependency there (v1.8) and refer to it without overriding the version.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/OrcidDSManager.java Outdated

						
				@ -56,3 +55,4 @@

							fs = FileSystem.get(URI.create(hdfsServerUri.concat(workingPath)), conf);

						} catch (IOException e) {

							// TODO Auto-generated catch block

							e.printStackTrace();

claudio.atzori commented

2020-09-21 16:20:10 +02:00

Let the exception propagate and break the job

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/ActivitiesDumpReader.java Outdated

						
				@ -0,0 +128,4 @@

											}

										}

									} catch (Exception e) {

										Log

claudio.atzori commented

2020-09-16 10:02:31 +02:00

What is the reason for not handling nor let propagate this exception? I imagine that a malformed entry in the tar file could cause it, but in that case we should interrupt the procedure and deepen the analysis to spot the error. In this way the error would likely be unnoticed, but causing a drop in the number of output records.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/SparkGenEnrichedOrcidWorks.java Outdated

						
				@ -0,0 +167,4 @@

								return name.getAsString();

							}

						}

						return null;

claudio.atzori commented

2020-09-21 16:09:03 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java Outdated

						
				@ -0,0 +98,4 @@

											"/eu/dnetlib/dhp/doiboost/orcidnodoi/mappings/typologies.json"));

							typologiesMapping = new Gson().fromJson(tt, Map.class);

						} catch (final Exception e) {

							logger.error("loading typologies", e);

claudio.atzori commented

2020-09-16 10:08:01 +02:00

This should not happen as this is statically defined, but please let the exception propagate with some subclass of Throwable so that it will break immediately. Otherwise the typologiesMapping variable will stay defined as null causing the 1st usage to break with a NPE.

This should not happen as this is statically defined, but please let the exception propagate with some subclass of `Throwable` so that it will break immediately. Otherwise the `typologiesMapping` variable will stay defined as `null` causing the 1st usage to break with a NPE.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +117,4 @@

							if (errorsGeneric != null) {

								errorsGeneric.add(1);

							}

							return null;

claudio.atzori commented

2020-09-21 16:08:46 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:40:36 +02:00

yes, there is a filter for null value:
JavaRDD oafPublicationRDD = enrichedWorksRDD
.map(
e -> {
return (Publication) publicationToOaf
.generatePublicationActionsFromJson(e._2());
})
.filter(p -> p != null);

yes, there is a filter for null value: JavaRDD<Publication> oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +124,4 @@

					public Oaf generatePublicationActionsFromDump(final JsonObject rootElement) {

						if (!isValid(rootElement)) {

							return null;

claudio.atzori commented

2020-09-21 16:08:08 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:43:07 +02:00

yes, there is a filter for null value:
JavaRDD oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

yes, there is a filter for null value: JavaRDD oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +175,4 @@

							if (errorsInvalidTitle != null) {

								errorsInvalidTitle.add(1);

							}

							return null;

claudio.atzori commented

2020-09-21 16:09:36 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:44:57 +02:00

yes, there is a check on null value

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +245,4 @@

							if (errorsInvalidType != null) {

								errorsInvalidType.add(1);

							}

							return null;

claudio.atzori commented

2020-09-21 16:10:00 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:46:24 +02:00

yes, there is a filter on null value:
JavaRDD oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

yes, there is a filter on null value: JavaRDD oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +256,4 @@

							if (errorsNotFoundAuthors != null) {

								errorsNotFoundAuthors.add(1);

							}

							return null;

claudio.atzori commented

2020-09-21 16:10:19 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:47:09 +02:00

yes, there is a filter on null value: JavaRDD oafPublicationRDD = enrichedWorksRDD .map( e -> { return (Publication) publicationToOaf .generatePublicationActionsFromJson(e._2()); }) .filter(p -> p != null);

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +328,4 @@

							return authors;

						}

						return null;

claudio.atzori commented

2020-09-21 16:10:32 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:48:26 +02:00

yes, there is a check on null value

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +333,4 @@

					private List<String> createRepeatedField(final JsonObject rootElement, final String fieldName) {

						if (!rootElement.has(fieldName)) {

							return null;

claudio.atzori commented

2020-09-21 16:10:54 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:49:33 +02:00

yes, there is a check on null value

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +336,4 @@

							return null;

						}

						if (rootElement.has(fieldName) && rootElement.get(fieldName).isJsonNull()) {

							return null;

claudio.atzori commented

2020-09-21 16:11:02 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:50:25 +02:00

yes, there is a check on null value

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +340,4 @@

						}

						if (rootElement.get(fieldName).isJsonArray()) {

							if (!isValidJsonArray(rootElement, fieldName)) {

								return null;

claudio.atzori commented

2020-09-21 16:11:13 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 14:50:41 +02:00

yes, there is a check on null value

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java Outdated

						
				@ -0,0 +387,4 @@

						try {

							pubDateJson = rootElement.getAsJsonObject(jsonKey);

						} catch (Exception e) {

							return null;

claudio.atzori commented

2020-09-21 16:11:37 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:00:35 +02:00

yes, there is this ckeck on the value: StringUtils.isNotBlank

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java Outdated

						
				@ -0,0 +390,4 @@

							return null;

						}

						if (pubDateJson == null) {

							return null;

claudio.atzori commented

2020-09-21 16:11:45 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:01:51 +02:00

yes, there is this ckeck on the value: StringUtils.isNotBlank

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +397,4 @@

						final String day = getStringValue(pubDateJson, "day");

						if (StringUtils.isBlank(year)) {

							return null;

claudio.atzori commented

2020-09-21 16:11:54 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:02:13 +02:00

yes, there is this ckeck on the value: StringUtils.isNotBlank

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +413,4 @@

						if (isValidDate(pubDate)) {

							return pubDate;

						}

						return null;

claudio.atzori commented

2020-09-21 16:12:15 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:02:26 +02:00

yes, there is this ckeck on the value: StringUtils.isNotBlank

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +475,4 @@

					private StructuredProperty mapStructuredProperty(String value, Qualifier qualifier, DataInfo dataInfo) {

						if (value == null | StringUtils.isBlank(value)) {

							return null;

claudio.atzori commented

2020-09-21 16:15:37 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/oaf/PublicationToOaf.java

						
				@ -0,0 +487,4 @@

					private Field<String> mapStringField(String value, DataInfo dataInfo) {

						if (value == null || StringUtils.isBlank(value)) {

							return null;

claudio.atzori commented

2020-09-21 16:15:17 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:24:16 +02:00

the value returned from this function is used to populate 2 fields of publication:
publication.setSource(value)
publication.setDateofacceptance(value)

the value returned from this function is used to populate 2 fields of publication: publication.setSource(value) publication.setDateofacceptance(value)

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/util/DumpToActionsUtility.java Outdated

						
				@ -0,0 +20,4 @@

					public static String getStringValue(final JsonObject root, final String key) {

						if (root.has(key) && !root.get(key).isJsonNull())

							return root.get(key).getAsString();

						return null;

claudio.atzori commented

2020-09-21 16:14:54 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:31:27 +02:00

replaced null value with a more safe empty string

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/xml/XMLRecordParserNoDoi.java

						
				@ -0,0 +69,4 @@

							final String id = (workNodes.get(0).getAttributes().get("put-code"));

							workData.setId(id);

						} else {

							return null;

claudio.atzori commented

2020-09-21 16:12:30 +02:00

Is the caller expecting the null? Otherwise this would likely produce a NPE.

Is the caller expecting the `null`? Otherwise this would likely produce a NPE.

enrico.ottonello commented

2020-10-22 15:39:46 +02:00

in this case null value indicate that the original publication in xml format was not parsed, because of mandatory information was not found; there is a check on this null value

dhp-workflows/dhp-doiboost/src/test/java/eu/dnetlib/doiboost/orcidnodoi/xml/OrcidNoDoiTest.java Outdated

						
				@ -0,0 +29,4 @@

				import eu.dnetlib.doiboost.orcidnodoi.similarity.AuthorMatcher;

				import jdk.nashorn.internal.ir.annotations.Ignore;

				public class OrcidNoDoiTest {

claudio.atzori commented

2020-09-21 16:18:52 +02:00

Why are the tests in this class marked as @Ignore? I see you spent some effort implementing them, I didn't dive much into the details, but I see assertions properly defined so if they are valuable I suggest to keep the test as active.

Why are the tests in this class marked as `@Ignore`? I see you spent some effort implementing them, I didn't dive much into the details, but I see assertions properly defined so if they are valuable I suggest to keep the test as active.

claudio.atzori reviewed 2020-09-22 11:10:54 +02:00

dhp-workflows/dhp-doiboost/pom.xml Outdated

claudio.atzori commented

2020-09-22 11:10:54 +02:00

Please remove the dependency version at all from here. The version has to be declared only in the main pom file. Just like all the other dependencies in the same pom file.

Please remove the dependency version at all from here. The version has to be declared *only* in the main pom file. Just like all the other dependencies in the same pom file.

claudio.atzori requested changes 2020-09-22 12:34:57 +02:00

claudio.atzori left a comment

Please remove the dependency version at all from dhp-workflows/dhp-doiboost/pom.xml. All dependency versions has to be declared only in the main pom file.

Please remove the dependency version at all from `dhp-workflows/dhp-doiboost/pom.xml`. All dependency versions has to be declared only in the main pom file.

enrico.ottonello commented

2020-10-22 16:49:32 +02:00

All needed modified was done for now. I updated the fork with current master.

claudio.atzori reviewed 2020-11-03 17:02:04 +01:00

dhp-workflows/dhp-doiboost/pom.xml Outdated

claudio.atzori commented

2020-11-03 17:02:04 +01:00

I still see the dependency version declared here. Please move in the project's main pom under the dependencyManagement section.

I still see the dependency version declared here. Please move in the project's main pom under the _dependencyManagement_ section.

claudio.atzori requested changes 2020-11-03 17:06:43 +01:00

claudio.atzori left a comment

Please revise the dependency section in the pom file. The version for the dependency org.apache.commons:commons-text must be moved in the project's main pom file.

More specifically, I refer to this dependency: https://code-repo.d4science.org/enrico.ottonello/dnet-hadoop/src/branch/orcid-no-doi/dhp-workflows/dhp-doiboost/pom.xml#L90

Please revise the dependency section in the pom file. The version for the dependency `org.apache.commons:commons-text` must be moved in the project's main pom file. More specifically, I refer to this dependency: https://code-repo.d4science.org/enrico.ottonello/dnet-hadoop/src/branch/orcid-no-doi/dhp-workflows/dhp-doiboost/pom.xml#L90

enrico.ottonello commented

2020-11-04 10:42:42 +01:00

I had already moved that dependency in the main pom.xml file, as you can see here https://code-repo.d4science.org/enrico.ottonello/dnet-hadoop/src/branch/orcid-no-doi/pom.xml#L689

claudio.atzori commented

2020-11-04 12:45:11 +01:00

I had already moved that dependency in the main pom.xml file, as you can see here https://code-repo.d4science.org/enrico.ottonello/dnet-hadoop/src/branch/orcid-no-doi/pom.xml#L689

... well no, you moved the version property declaration in the main pom file. You should move the dependency declaration in the dependencyManagement section in the main pom file, and then in the doiboost submodule express the dependency towards commons-text without specifying any version. The rationale here is that ALL the versions for the external libraries we're depending on must be expressed in a single place. Otherwise if everybody starts to indicate arbitrary library versions in each submodule, the classpath would become a jungle of libraries, likely with conflicting versions of the same library.

> I had already moved that dependency in the main pom.xml file, as you can see here https://code-repo.d4science.org/enrico.ottonello/dnet-hadoop/src/branch/orcid-no-doi/pom.xml#L689 ... well no, you moved the version property declaration in the main pom file. You should move the dependency declaration in the `dependencyManagement` section in the main pom file, and then in the doiboost submodule express the dependency towards `commons-text` without specifying any version. The rationale here is that ALL the versions for the external libraries we're depending on must be expressed in a single place. Otherwise if everybody starts to indicate arbitrary library versions in each submodule, the classpath would become a jungle of libraries, likely with conflicting versions of the same library.

claudio.atzori commented

2020-11-11 14:25:31 +01:00

I just noticed that in SparkGenEnrichedOrcidWorks.java the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a

SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text> where

keys are defined as the entity type class fully qualified name (e.g. eu.dnetlib.dhp.schema.oaf.Publication)
values are defined as eu.dnetlib.dhp.schema.action.AtomicActions, a simple wrapper class with just two fields: 1) Class<T> clazz; and 2) T payload; where T extends Oaf.

I just noticed that in [SparkGenEnrichedOrcidWorks.java](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/commit/fea2451658a30f36c751741cbbb103fdaee33d5b/dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/SparkGenEnrichedOrcidWorks.java) the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a ```SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text>``` where * keys are defined as the entity type class fully qualified name (e.g. [`eu.dnetlib.dhp.schema.oaf.Publication`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/Publication.java)) * values are defined as [`eu.dnetlib.dhp.schema.action.AtomicAction`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/action/AtomicAction.java)s, a simple wrapper class with just two fields: 1) `Class<T> clazz`; and 2) `T payload`; where `T extends Oaf`.

claudio.atzori requested changes 2020-11-11 14:26:05 +01:00

claudio.atzori left a comment

I just noticed that in SparkGenEnrichedOrcidWorks.java the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a

SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text> where

keys are defined as the entity type class fully qualified name (e.g. eu.dnetlib.dhp.schema.oaf.Publication)
values are defined as eu.dnetlib.dhp.schema.action.AtomicActions, a simple wrapper class with just two fields: 1) Class<T> clazz; and 2) T payload; where T extends Oaf.

I just noticed that in [SparkGenEnrichedOrcidWorks.java](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/commit/fea2451658a30f36c751741cbbb103fdaee33d5b/dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/SparkGenEnrichedOrcidWorks.java) the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a ```SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text>``` where * keys are defined as the entity type class fully qualified name (e.g. [`eu.dnetlib.dhp.schema.oaf.Publication`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/Publication.java)) * values are defined as [`eu.dnetlib.dhp.schema.action.AtomicAction`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/action/AtomicAction.java)s, a simple wrapper class with just two fields: 1) `Class<T> clazz`; and 2) `T payload`; where `T extends Oaf`.

dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcidnodoi/SparkGenEnrichedOrcidWorks.java Outdated

						
				@ -0,0 +128,4 @@

									.createDataset(

										oafPublicationRDD.repartition(1).rdd(),

										Encoders.bean(Publication.class));

								publicationDataset

claudio.atzori commented

2020-11-11 14:06:40 +01:00

I just noticed the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a

SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text> where

keys are defined as the entity type class fully qualified name (e.g. eu.dnetlib.dhp.schema.oaf.Publication)
values are defined as eu.dnetlib.dhp.schema.action.AtomicActions, a simple wrapper class with just two fields: 1) Class<T> clazz; and 2) T payload; where T extends Oaf.

I just noticed the output format being set to parquet. As the records in this set must be integrated in the graph via the so called Actions Management system, the data created by this procedure should comply with the input format & model it requires, i.e. a ```SequenceFile<org.apache.hadoop.io.Text, org.apache.hadoop.io.Text>``` where * keys are defined as the entity type class fully qualified name (e.g. [`eu.dnetlib.dhp.schema.oaf.Publication`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/Publication.java)) * values are defined as [`eu.dnetlib.dhp.schema.action.AtomicAction`](https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/action/AtomicAction.java)s, a simple wrapper class with just two fields: 1) `Class<T> clazz`; and 2) `T payload`; where `T extends Oaf`.