dump of the results related to at least one project #61

Merged
claudio.atzori merged 51 commits from miriam.baglioni/dnet-hadoop:dump into master 2020-12-09 17:22:57 +01:00

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded.

Main modification:

  • the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality

  • the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded. Main modification: - the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality - the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}
miriam.baglioni added the
enhancement
label 2020-11-25 14:14:20 +01:00
claudio.atzori was assigned by miriam.baglioni 2020-11-25 14:14:20 +01:00
alessia.bardi was assigned by miriam.baglioni 2020-11-25 14:14:20 +01:00
claudio.atzori reviewed 2020-11-25 17:19:07 +01:00
@ -93,3 +93,1 @@
if (name.trim().equalsIgnoreCase("communities_infrastructures")) {
name = "communities_infrastructures.json";
}
// if (name.trim().equalsIgnoreCase("communities_infrastructures")) {

please clean up unused code

please clean up unused code
Author
Member

done

done
claudio.atzori reviewed 2020-11-25 17:23:12 +01:00
@ -26,6 +26,8 @@ public class Constants {
public static String ORCID = "orcid";
public static String RESULT_PROJECT_IS_PRODUCED_BY = "isProducedBy";
Please avoid duplicating the constants. I think you can refer to https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/common/ModelConstants.java#L54
Author
Member

ModelConstants does not contain ORCID. I can add it there instead. Ok for IS_PRODUCED_BY

ModelConstants does not contain ORCID. I can add it there instead. Ok for IS_PRODUCED_BY
claudio.atzori reviewed 2020-11-25 17:25:49 +01:00
@ -0,0 +274,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/publication</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/publication</arg>-->

cleanup commented definitions, please

cleanup commented definitions, please
Author
Member

done

done
claudio.atzori reviewed 2020-11-25 17:26:07 +01:00
@ -0,0 +302,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/dataset</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/dataset</arg>-->

cleanup comments, please

cleanup comments, please
claudio.atzori reviewed 2020-11-25 17:26:25 +01:00
@ -0,0 +330,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/otherresearchproduct</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/otherresearchproduct</arg>-->

yet another comment to cleanup

yet another comment to cleanup
claudio.atzori reviewed 2020-11-25 17:26:56 +01:00
@ -0,0 +358,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/software</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/software</arg>-->

remove, please!

remove, please!
claudio.atzori reviewed 2020-11-25 17:40:18 +01:00
@ -0,0 +532,4 @@
<main-class>eu.dnetlib.dhp.oa.graph.dump.MakeTar</main-class>
<arg>--hdfsPath</arg><arg>${outputPath}</arg>
<arg>--nameNode</arg><arg>${nameNode}</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${workingDir}/resultperfunder</arg>-->

cleanup, please

cleanup, please
claudio.atzori reviewed 2020-11-25 17:41:11 +01:00
@ -126,0 +164,4 @@
final Consumer<ContextInfo> consumer = ci -> cInfoList.add(ci);
queryInformationSystem.getContextInformation(consumer);
//List<ResearchInitiative> riList = new ArrayList<>();

cleanup

cleanup
Author
Member

done

done
claudio.atzori reviewed 2020-11-25 17:43:11 +01:00
@ -126,0 +169,4 @@
try {
writer.write(new Gson().toJson(Process.getEntity(cInfo)));
} catch (IOException e) {
e.printStackTrace();

why an exception risen here should not interrupt the execution?

why an exception risen here should not interrupt the execution?
Author
Member

because it is in a lambda expression. Anyway we can remove this test. It was just needed to verify that the file was written compressed.

because it is in a lambda expression. Anyway we can remove this test. It was just needed to verify that the file was written compressed.
Author
Member

I have left the test, and removed the lambda

I have left the test, and removed the lambda
claudio.atzori reviewed 2020-11-25 17:45:40 +01:00
@ -31,2 +31,3 @@
"paramRequired": true
}
},{
"paramName":"dt",

you can probably indent this json record in a more uniform way

you can probably indent this json record in a more uniform way
claudio.atzori reviewed 2020-11-25 17:45:52 +01:00
@ -0,0 +23,4 @@
"paramDescription": "the name of the result table we are currently working on",
"paramRequired": true
}, {
"paramName":"rp",

you can probably indent this json record in a more uniform way

you can probably indent this json record in a more uniform way
claudio.atzori requested changes 2020-11-25 17:48:49 +01:00
claudio.atzori left a comment
Owner

Overall, it looks pretty good, just some minor changes on

  • cleanup of commented code lines and in the workflow definitions
  • formatting in json confs
  • an exception eaten in a disabled unit test
Overall, it looks pretty good, just some minor changes on - cleanup of commented code lines and in the workflow definitions - formatting in json confs - an exception eaten in a disabled unit test
Author
Member

Requested changes done. There is also another change to check: two classes have been added to common to allow the mapping for the doiBoost result in the public format.

Requested changes done. There is also another change to check: two classes have been added to common to allow the mapping for the doiBoost result in the public format.
claudio.atzori changed title from WIP: dump of the results related to at least one project to dump of the results related to at least one project 2020-12-09 17:22:49 +01:00
claudio.atzori closed this pull request 2020-12-09 17:22:57 +01:00
Sign in to join this conversation.
No description provided.