dump of the OpenAIRE graph #40

Closed
miriam.baglioni wants to merge 0 commits from miriam.baglioni/dnet-hadoop:dump into master

This PR regards two dumps of the OpenAIRE research graph.
To provide this dumps external models (derived from the internal OpenAIRE one) have been defined.
The dhp-schemas module has been extended with new packages and classes:

  1. eu.dnetlib.dhp.schema.dump.oaf: that contains the classes in common for the two dumps:
    • Qualifier. To represent the information described by a code and a value
    • KyeValue. To represent the information described by a key and a value.
    • AccessRight. Used to represent the result access rights.
    • APC. It will be used to refer to the Article Processing Charge information. Not dumped in this release. - ControlledField. To represent the information described by a scheme and a value in that scheme (i.e. pid).
    • Provenance. Indicates the process that produced (or provided) the information, and the trust associated to the information.
    • Pid. To represent the generic persistent identifier with provenance
    • Author. Used to represent the generic author of the result.
    • Container (only for results of type publication) to store information about the conference or journal where the result has been presented or published.
    • Country represents the country associated to this result.
    • ExternalReference to be changed and described in the next PR
    • GeoLocation (only for results of type dataset) represents the geolocation information.
    • Instance represents the manifestations (i.e. different versions) of the result. For example: the pre-print and the published versions are two manifestations of the same research result.
    • Subject to represent keywords associated to the result.
    • Result to represent the dumped result. It will be extended in the dump for Research Communities
  2. eu.dnetlib.dhp.schema.dump.oaf.community. It contains specific classes for the dump of Research Communities - Research Initiatives/Infrastructures:
    • Context reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE.
    • Project to store information about the project related to the result. This information is not directly mapped from the result represented in the internal model because it is not there. The mapped result will be enriched with project information derived by relation between results and projects. The way used to do it will be described afterwards.
    • Funder to store information about the funder funding the project related to the result.
    • CommunityResult extends eu.dnetlib.dhp.schema.dump.oaf.Result with the following parameters to store information about Projects, Context
  3. eu.dnetlib.dhp.schema.dump.oaf.graph. It contains specific classes for the dump of the whole graph:
  • Datasource to store information about the datasource OpenAIRE collects information from.
  • Organization to store information about the organization
  • Funder to store information abount the funder funding a project
  • Fundings to store information about the funding schema
  • Granted to store information about the granted amount to the project
  • Levels to be removed in the next PR
  • Node to store inforamation about the source/target in a relation
  • Programme to store information about the programme the project is related to
  • Project to store information about the project related to the result
  • Relation to store information about relation involving generic entities
  • RelType to store information about the semantics af a relation
  • ResearchCommunity to store infromation about ResearchCommunities (it extend the ResearchInitiative)
  • ResearchInitiative to store information about the research Initiative

To execute the dump for the products related to Research Communities - Research Infrastructures/Initiative the following actions are performed:

  • preliminary step to save on HDFS information about RC/RI (a map containing the association between the community identifier and label)
  • dump of each result type associated to RC/RI represented in the internal model as one instance of type eu.dnetlib.dhp.schema.dump.oaf.community.CommunityResult (to verify how the dump is performed see https://docs.google.com/document/d/1IUqSEC1G8t_chtNSpC6KRewv1TZzlS4t_Wx5lgnKGYI/edit)
  • a preparation step where each result is associated to the list of projects it has a relation with. Each project in the list is of type eu.dnetlib.dhp.schema.dump.oaf.community.Project
  • extention step where each previously dumped result will be updated with information about the projects it is associated to as provided by the preparation step
  • split step where each result is associated to the community(ies) it belongs to. Each community will have a specific “folder” containing all its relate results
  • archive step where a tar archive is create to store the dump for each community
  • publish step where each community archive is puplished on Zenodo.

To execute the dump for the whole graph the following actions are performed:

  • preliminary step to save on HDFS information about RC/RI (a map containing the association between the community identifier and label)
  • dump of each entity in the original model:
    • each result type in the internal model will be dumped as one result of type eu.dnetlib.dhp.schema.dump.oaf.Result
    • each organization in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Organization
    • each datasource in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Datasource
    • each project in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Project
    • each relation in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Relation
  • entity creation step where entities of type eu.dnetlib.dhp.schema.dump.oaf.graph.ResearchCommunity or eu.dnetlib.dhp.schema.dump.oaf.graph.ResearchInitiative are created by exploiting the information contained in the profile fro RC/RI
  • relation creation step where instance of type eu.dnetlib.dhp.schema.dump.oaf.graph.Relation are created from:
    • the profile of RC/RI by instantiating a relation between the entity and the datasource, project stored in community profile
    • the organization related to the RC/RI. This information is not taken in the profile, but it is given as parameter and it is the same used for the propagation of community trough organization
    • the result entities. The collectedfrom, hostedby, context in the result represented in the internal model are not dumped in the result for the external model relations are created instead:
      • collectedfrom becames datasource -> provides -> result, result -> isProvidedBy -> datasource
      • hostedby becames datasource -> hosts -> result, result -> isHostedBy -> datasource
      • context becames context <-> isRelatedTo <-> result
  • a collection step where each dumped entity of the same type is stored in the same folder
  • archive step where an archive af type tar is produced for every dumped entity
  • publish step where each produced archive is published on Zenodo
This PR regards two dumps of the OpenAIRE research graph. To provide this dumps external models (derived from the internal OpenAIRE one) have been defined. The dhp-schemas module has been extended with new packages and classes: 1. eu.dnetlib.dhp.schema.dump.oaf: that contains the classes in common for the two dumps: - Qualifier. To represent the information described by a code and a value - KyeValue. To represent the information described by a key and a value. - AccessRight. Used to represent the result access rights. - APC. It will be used to refer to the Article Processing Charge information. Not dumped in this release. - ControlledField. To represent the information described by a scheme and a value in that scheme (i.e. pid). - Provenance. Indicates the process that produced (or provided) the information, and the trust associated to the information. - Pid. To represent the generic persistent identifier with provenance - Author. Used to represent the generic author of the result. - Container (only for results of type publication) to store information about the conference or journal where the result has been presented or published. - Country represents the country associated to this result. - ExternalReference to be changed and described in the next PR - GeoLocation (only for results of type dataset) represents the geolocation information. - Instance represents the manifestations (i.e. different versions) of the result. For example: the pre-print and the published versions are two manifestations of the same research result. - Subject to represent keywords associated to the result. - Result to represent the dumped result. It will be extended in the dump for Research Communities 2. eu.dnetlib.dhp.schema.dump.oaf.community. It contains specific classes for the dump of Research Communities - Research Initiatives/Infrastructures: - Context reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE. - Project to store information about the project related to the result. This information is not directly mapped from the result represented in the internal model because it is not there. The mapped result will be enriched with project information derived by relation between results and projects. The way used to do it will be described afterwards. - Funder to store information about the funder funding the project related to the result. - CommunityResult extends eu.dnetlib.dhp.schema.dump.oaf.Result with the following parameters to store information about Projects, Context 3. eu.dnetlib.dhp.schema.dump.oaf.graph. It contains specific classes for the dump of the whole graph: - Datasource to store information about the datasource OpenAIRE collects information from. - Organization to store information about the organization - Funder to store information abount the funder funding a project - Fundings to store information about the funding schema - Granted to store information about the granted amount to the project - Levels to be removed in the next PR - Node to store inforamation about the source/target in a relation - Programme to store information about the programme the project is related to - Project to store information about the project related to the result - Relation to store information about relation involving generic entities - RelType to store information about the semantics af a relation - ResearchCommunity to store infromation about ResearchCommunities (it extend the ResearchInitiative) - ResearchInitiative to store information about the research Initiative To execute the dump for the products related to Research Communities - Research Infrastructures/Initiative the following actions are performed: - preliminary step to save on HDFS information about RC/RI (a map containing the association between the community identifier and label) - dump of each result type associated to RC/RI represented in the internal model as one instance of type eu.dnetlib.dhp.schema.dump.oaf.community.CommunityResult (to verify how the dump is performed see https://docs.google.com/document/d/1IUqSEC1G8t_chtNSpC6KRewv1TZzlS4t_Wx5lgnKGYI/edit) - a preparation step where each result is associated to the list of projects it has a relation with. Each project in the list is of type eu.dnetlib.dhp.schema.dump.oaf.community.Project - extention step where each previously dumped result will be updated with information about the projects it is associated to as provided by the preparation step - split step where each result is associated to the community(ies) it belongs to. Each community will have a specific “folder” containing all its relate results - archive step where a tar archive is create to store the dump for each community - publish step where each community archive is puplished on Zenodo. To execute the dump for the whole graph the following actions are performed: - preliminary step to save on HDFS information about RC/RI (a map containing the association between the community identifier and label) - dump of each entity in the original model: - each result type in the internal model will be dumped as one result of type eu.dnetlib.dhp.schema.dump.oaf.Result - each organization in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Organization - each datasource in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Datasource - each project in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Project - each relation in the internal model will be dumped as one instance of eu.dnetlib.dhp.schema.dump.oaf.graph.Relation - entity creation step where entities of type eu.dnetlib.dhp.schema.dump.oaf.graph.ResearchCommunity or eu.dnetlib.dhp.schema.dump.oaf.graph.ResearchInitiative are created by exploiting the information contained in the profile fro RC/RI - relation creation step where instance of type eu.dnetlib.dhp.schema.dump.oaf.graph.Relation are created from: - the profile of RC/RI by instantiating a relation between the entity and the datasource, project stored in community profile - the organization related to the RC/RI. This information is not taken in the profile, but it is given as parameter and it is the same used for the propagation of community trough organization - the result entities. The collectedfrom, hostedby, context in the result represented in the internal model are not dumped in the result for the external model relations are created instead: - collectedfrom becames datasource -> provides -> result, result -> isProvidedBy -> datasource - hostedby becames datasource -> hosts -> result, result -> isHostedBy -> datasource - context becames context <-> isRelatedTo <-> result - a collection step where each dumped entity of the same type is stored in the same folder - archive step where an archive af type tar is produced for every dumped entity - publish step where each produced archive is published on Zenodo
claudio.atzori reviewed 2020-08-12 19:12:53 +02:00
@ -0,0 +29,4 @@
Assertions.assertEquals(200, client.uploadIS(is, "COVID-19.json.gz", file.length()));
String metadata = "{\"metadata\":{\"access_right\":\"open\",\"communities\":[{\"identifier\":\"openaire-research-graph\"}],\"creators\":[{\"affiliation\":\"ISTI - CNR\",\"name\":\"Bardi, Alessia\",\"orcid\":\"0000-0002-1112-1292\"},{\"affiliation\":\"eifl\", \"name\":\"Kuchma, Iryna\"},{\"affiliation\":\"BIH\", \"name\":\"Brobov, Evgeny\"},{\"affiliation\":\"GIDIF RBM\", \"name\":\"Truccolo, Ivana\"},{\"affiliation\":\"unesp\", \"name\":\"Monteiro, Elizabete\"},{\"affiliation\":\"und\", \"name\":\"Casalegno, Carlotta\"},{\"affiliation\":\"CARL ABRC\", \"name\":\"Clary, Erin\"},{\"affiliation\":\"The University of Edimburgh\", \"name\":\"Romanowski, Andrew\"},{\"affiliation\":\"ISTI - CNR\", \"name\":\"Pavone, Gina\"},{\"affiliation\":\"ISTI - CNR\", \"name\":\"Artini, Michele\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Atzori, Claudio\",\"orcid\":\"0000-0001-9613-6639\"},{\"affiliation\":\"University of Bielefeld\",\"name\":\"Bäcker, Amelie\",\"orcid\":\"0000-0001-6015-2063\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Baglioni, Miriam\",\"orcid\":\"0000-0002-2273-9004\"},{\"affiliation\":\"University of Bielefeld\",\"name\":\"Czerniak, Andreas\",\"orcid\":\"0000-0003-3883-4169\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"De Bonis, Michele\"},{\"affiliation\":\"Athena Research and Innovation Centre\",\"name\":\"Dimitropoulos, Harry\"},{\"affiliation\":\"Athena Research and Innovation Centre\",\"name\":\"Foufoulas, Ioannis\"},{\"affiliation\":\"University of Warsaw\",\"name\":\"Horst, Marek\"},{\"affiliation\":\"Athena Research and Innovation Centre\",\"name\":\"Iatropoulou, Katerina\"},{\"affiliation\":\"University of Warsaw\",\"name\":\"Jacewicz, Przemyslaw\"},{\"affiliation\":\"Athena Research and Innovation Centre\",\"name\":\"Kokogiannaki, Argiro\", \"orcid\":\"0000-0002-3880-0244\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"La Bruzzo, Sandro\",\"orcid\":\"0000-0003-2855-1245\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Lazzeri, Emma\"},{\"affiliation\":\"University of Bielefeld\",\"name\":\"Löhden, Aenne\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Manghi, Paolo\",\"orcid\":\"0000-0001-7291-3210\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Mannocci, Andrea\",\"orcid\":\"0000-0002-5193-7851\"},{\"affiliation\":\"Athena Research and Innovation Center\",\"name\":\"Manola, Natalia\"},{\"affiliation\":\"ISTI - CNR\",\"name\":\"Ottonello, Enrico\"},{\"affiliation\":\"University of Bielefeld\",\"name\":\"Shirrwagen, Jochen\"}],\"description\":\"\\u003cp\\u003eThis dump provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dump contains records of the OpenAIRE COVID-19 Gateway (https://covid-19.openaire.eu/), identified via full-text mining and inference techniques applied to the OpenAIRE Research Graph (https://explore.openaire.eu/). The Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA. \\u003cp\\u003eThe dump consists of a gzip file containing one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.3974226\\u003c/p\\u003e \",\"title\":\"OpenAIRE Covid-19 publications, datasets, software and projects metadata.\",\"upload_type\":\"dataset\",\"version\":\"1.0\"}}";

Can we define this as a classpath test resource?

Can we define this as a classpath test resource?
Author
Member

yes, I will

yes, I will
claudio.atzori reviewed 2020-08-12 19:17:14 +02:00
@ -0,0 +16,4 @@
private Pid pid;
private List<String> affiliation;

We shouldn't expose author affiliations, I believe they create confusion with the result - organization relationships we're already providing.

We shouldn't expose author affiliations, I believe they create confusion with the result - organization relationships we're already providing.
Author
Member

Ok, I will remove that part. There is no problem for the dump of community products, since we do not provide relations. I think we could leave it for communities and remove it for the dump of the whole graph.

Ok, I will remove that part. There is no problem for the dump of community products, since we do not provide relations. I think we could leave it for communities and remove it for the dump of the whole graph.

As for the graph complete graph, it should be removed, yes.

Instead for the communities, it is extra information that potentially could be useful for the dump consumers, but I still have the feeling that it is an information that we didn't analyse enough to have a more clear picture of its usefulness.

We should start with something simple:

  • coverage of results with authors that provide such information;
  • check of the same affiliation text is expressed always in the same format across different authors;
  • top N affiliations;
  • ...

So, IMO we can keep it, but until we won't be sure about basic statistics proving its usefulness, I doubt we should advertise it.

As for the graph complete graph, it should be removed, yes. Instead for the communities, it is extra information that potentially could be useful for the dump consumers, but I still have the feeling that it is an information that we didn't analyse enough to have a more clear picture of its usefulness. We should start with something simple: - coverage of results with authors that provide such information; - check of the same affiliation text is expressed always in the same format across different authors; - top N affiliations; - ... So, IMO we can keep it, but until we won't be sure about basic statistics proving its usefulness, I doubt we should advertise it.
Author
Member

Ok then, I will remove it also for RC/RI. In case we can always add it in a second time

Ok then, I will remove it also for RC/RI. In case we can always add it in a second time
claudio.atzori reviewed 2020-08-12 19:21:55 +02:00
@ -0,0 +3,4 @@
import java.io.Serializable;
import eu.dnetlib.dhp.schema.oaf.StructuredProperty;

Can we have a separate factory class to build a ControlledField from a StructuredProperty?One model definition should never directly depend from the other, the mapping should express such dependency instead.

Can we have a separate factory class to build a ControlledField from a StructuredProperty?One model definition should never directly depend from the other, the mapping should express such dependency instead.
Author
Member

I will change the implementation of the method that brings the dependency

I will change the implementation of the method that brings the dependency
Author
Member

Actually there was no need. The import was just a left over. I should have already done it and just did not remember

Actually there was no need. The import was just a left over. I should have already done it and just did not remember
claudio.atzori reviewed 2020-08-12 19:28:36 +02:00
@ -0,0 +7,4 @@
import eu.dnetlib.dhp.schema.oaf.ExtraInfo;
//ExtraInfo renamed ExternalReference do not confuse with ExternalReference in oaf schema
public class ExternalReference implements Serializable {

I see two major issues in this model class:

  • If the purpose of this class is to model references to external objects (e.g. PDB codes), then I suggest to rename it as ExternalReference clearly clashes with the internal model type;
  • The fields in this class indicate that models the citations as they are currently represented in our internal representation. Let me remember that currently citations are encoded as XML elements stored as plain strings. This would result in a well defined JSON serialization that would include a field (value) containing the citation XML representation. I am sure we do not want to expose them in our official dumps in such mixed encodings.
I see two major issues in this model class: * If the purpose of this class is to model references to external objects (e.g. PDB codes), then I suggest to rename it as ```ExternalReference``` clearly clashes with the internal model type; * The fields in this class indicate that models the citations as they are currently represented in our internal representation. Let me remember that currently citations are encoded as XML elements stored as plain strings. This would result in a well defined JSON serialization that would include a field (```value```) containing the citation XML representation. I am sure we do not want to expose them in our official dumps in such mixed encodings.
Author
Member

I do not remember why we chose to dump the ExtraInfo in the internal model and name it ExternalReference in the public model. I do agree with you: we do not want to expose mixed encodings. But we must decide what we want to deliver. Do we want to provide the ExternalReference in our internal model or do we want to provide citation information? In the first case we need to redefine the mapping. In the second case, we need to decide how to parse the xml containing the citation and which are the values we want to provide

I do not remember why we chose to dump the ExtraInfo in the internal model and name it ExternalReference in the public model. I do agree with you: we do not want to expose mixed encodings. But we must decide what we want to deliver. Do we want to provide the ExternalReference in our internal model or do we want to provide citation information? In the first case we need to redefine the mapping. In the second case, we need to decide how to parse the xml containing the citation and which are the values we want to provide

The general idea is to expose citations as relationships between objects found in the graph, so at the moment this would imply the implementation of a mapping from the current XML based encoding to a target representation based on the common dump Relation model. Personally I would avoid to further slow down the integration of this PR and postpone the implementation of such a mapping to a (near) future enhancement for the dump procedure.

We still need to discuss if we want to expose also the citation texts describing objects that are not found in the graph, and if yes, how to encode them in the dump.

The general idea is to expose citations as relationships between objects found in the graph, so at the moment this would imply the implementation of a mapping from the current XML based encoding to a target representation based on the common dump `Relation` model. Personally I would avoid to further slow down the integration of this PR and postpone the implementation of such a mapping to a (near) future enhancement for the dump procedure. We still need to discuss if we want to expose also the citation texts describing objects that are not found in the graph, and if yes, how to encode them in the dump.

Ok so, for the moment ExternalReferences will be removed from the dump data model:

  • citations will be mapped as proper relationships in September;
  • references to PDB codes will be integrated as proper entities after the integration with Scholexplorer will be completed.
Ok so, for the moment ExternalReferences will be removed from the dump data model: * citations will be mapped as proper relationships in September; * references to PDB codes will be integrated as proper entities after the integration with Scholexplorer will be completed.
claudio.atzori reviewed 2020-08-12 19:29:51 +02:00
@ -0,0 +22,4 @@
// ( article | book ) processing charges. Defined here to cope with possible wrongly typed
// results
// private Field<String> processingchargeamount;

Commented code lines/blocks should be removed.

Commented code lines/blocks should be removed.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:31:34 +02:00
@ -0,0 +170,4 @@
this.subjects = subjects;
}
// public String getPolicies() {

Commented code lines/blocks should be removed.

Commented code lines/blocks should be removed.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:32:19 +02:00
@ -0,0 +16,4 @@
private String jurisdiction;
// public String getId() {

Commented code lines/blocks should be removed.

Commented code lines/blocks should be removed.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:32:59 +02:00
@ -0,0 +76,4 @@
this.pid = pid;
}
// public List<KeyValue> getCollectedfrom() {

Commented code lines/blocks should be removed.

Commented code lines/blocks should be removed.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:34:43 +02:00
@ -90,0 +91,4 @@
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.7.2</version>

Sub modules should never declare the version of an external library; please move the dependency version in the main pom, where all the dependencies are declared along with their version.

Sub modules should never declare the version of an external library; please move the dependency version in the main pom, where all the dependencies are declared along with their version.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:36:41 +02:00
@ -41,6 +41,43 @@
</build>
<dependencies>
<!-- <dependency>-->

Please remove unnecessary dependencies

Please remove unnecessary dependencies
Author
Member

in the same pom there are also version associated to external libraries. I will move them in the main pom as for the comment above

in the same pom there are also version associated to external libraries. I will move them in the main pom as for the comment above
claudio.atzori reviewed 2020-08-12 19:38:13 +02:00
@ -0,0 +1,105 @@
/**

Empty Javadoc declaration? :) Please provide a synthetic one in the correct place, right before the class name declaration :)

Empty Javadoc declaration? :) Please provide a synthetic one in the correct place, right before the class name declaration :)
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:40:11 +02:00
@ -0,0 +52,4 @@
Element root = doc.getRootElement();
map.put(root.attribute("id").getValue(), root.attribute("label").getValue());
} catch (DocumentException e) {
e.printStackTrace();

Please allow the exception to propagate to the caller, printing it is not that helpful.

Please allow the exception to propagate to the caller, printing it is not that helpful.
Author
Member

Done.

Done.
claudio.atzori reviewed 2020-08-12 19:42:04 +02:00
@ -0,0 +1,84 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:43:25 +02:00
@ -0,0 +1,83 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:43:49 +02:00
@ -0,0 +1,62 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:44:28 +02:00
@ -0,0 +1,187 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:45:01 +02:00
@ -0,0 +1,48 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:46:33 +02:00
@ -0,0 +4,4 @@
import java.io.Serializable;
public class Constants implements Serializable {
// collectedFrom va con isProvidedBy -> becco da ModelSupport

I guess this comment is a leftover, either translate it in english, or delete it :)

I guess this comment is a leftover, either translate it in english, or delete it :)
Author
Member

It was a leftover as the one directly following :). Removed

It was a leftover as the one directly following :). Removed
claudio.atzori reviewed 2020-08-12 19:47:06 +02:00
@ -0,0 +1,84 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:47:31 +02:00
@ -0,0 +1,105 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:48:00 +02:00
@ -0,0 +1,125 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:49:01 +02:00
@ -0,0 +71,4 @@
cce.execute(Process::getRelation, CONTEX_RELATION_DATASOURCE, ModelSupport.getIdPrefix(Datasource.class));
log.info("Creating relations for projects... ");
// cce

Commented code lines/blocks should be removed.

Commented code lines/blocks should be removed.
Author
Member

these lines are left on purpose. We shall remember to add also the generation of relations between context a projects once the coverage in the community profiles of projects having their OpenAIRE id will be higher. I would be for de-commenting it, and getting the relation we can have.

these lines are left on purpose. We shall remember to add also the generation of relations between context a projects once the coverage in the community profiles of projects having their OpenAIRE id will be higher. I would be for de-commenting it, and getting the relation we can have.

Fine, this is the kind of comments that are well suited to sit near a commented code block.

Fine, this is the kind of comments that are well suited to sit near a commented code block.
claudio.atzori reviewed 2020-08-12 19:50:04 +02:00
@ -0,0 +1,502 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:53:51 +02:00
@ -0,0 +436,4 @@
return f;
} catch (DocumentException e) {
e.printStackTrace();

Let the exception propagate to the caller, printing it is not that useful.

Let the exception propagate to the caller, printing it is not that useful.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:54:53 +02:00
@ -0,0 +1,197 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:56:15 +02:00
@ -0,0 +1,98 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 19:59:47 +02:00
@ -0,0 +1,88 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 20:00:13 +02:00
@ -0,0 +1,51 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 20:00:34 +02:00
@ -0,0 +1,110 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 20:01:02 +02:00
@ -0,0 +1,56 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori reviewed 2020-08-12 20:01:17 +02:00
@ -0,0 +1,157 @@
/**

Consider to move the Javadoc right before the class name declaration.

Consider to move the Javadoc right before the class name declaration.
Author
Member

ack

ack
claudio.atzori requested changes 2020-08-12 20:05:33 +02:00
claudio.atzori left a comment
Owner

A general remark is about the lack of javadoc in the dump model classes. As I already know you got lots of descriptions for them in the google document, please consider to move them as proper javadoc definitions.

Except for a few issues in the model definition that should be further discussed, the majority the comments are about minor changes.

Thanks for this HUGE contribution :)

A general remark is about the lack of javadoc in the dump model classes. As I already know you got lots of descriptions for them in the google document, please consider to move them as proper javadoc definitions. Except for a few issues in the model definition that should be further discussed, the majority the comments are about minor changes. Thanks for this HUGE contribution :)
claudio.atzori added the
enhancement
label 2020-08-13 11:31:51 +02:00
claudio.atzori reviewed 2020-08-13 12:30:47 +02:00
@ -37,3 +37,3 @@
<arg>--hdfsNameNode</arg><arg>${nameNode}</arg>
<arg>--fileURL</arg><arg>${projectFileURL}</arg>
<arg>--hdfsPath</arg><arg>${workingDir}/projects</arg>
<arg>--hdfsPath</arg><arg>${workingDir}/project</arg>

Was this workflow changed on purpose? I assume the project enrichment workflow has nothing to do with the dump procedures. If this change was not intentional, please revert it.

Was this workflow changed on purpose? I assume the project enrichment workflow has nothing to do with the dump procedures. If this change was not intentional, please revert it.
Author
Member

The change was not intentional and I will revert it. Anyway it will not affect the execution of the process since it is a workingDir directory used to store information, that is referred to in the same way everywhere in the workflow

The change was not intentional and I will revert it. Anyway it will not affect the execution of the process since it is a workingDir directory used to store information, that is referred to in the same way everywhere in the workflow

PR manually integrated by 5b994d7ccf

PR manually integrated by https://code-repo.d4science.org/D-Net/dnet-hadoop/commit/5b994d7ccfdd6f74945be35949a208877678e5ad
claudio.atzori closed this pull request 2020-08-14 15:41:16 +02:00

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#40
No description provided.