dump of the results related to at least one project #61

miriam.baglioni · 2020-11-25T14:14:20+01:00

miriam.baglioni commented

2020-11-25 14:14:20 +01:00

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded.

Main modification:

the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality
the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded. Main modification: - the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality - the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}

miriam.baglioni added the

enhancement

label 2020-11-25 14:14:20 +01:00

claudio.atzori was assigned by miriam.baglioni

2020-11-25 14:14:20 +01:00

alessia.bardi was assigned by miriam.baglioni

2020-11-25 14:14:20 +01:00

claudio.atzori reviewed 2020-11-25 17:19:07 +01:00

dhp-common/src/main/java/eu/dnetlib/dhp/common/MakeTarArchive.java Outdated

						
				@ -93,3 +93,1 @@

							if (name.trim().equalsIgnoreCase("communities_infrastructures")) {

								name = "communities_infrastructures.json";

							}

				//			if (name.trim().equalsIgnoreCase("communities_infrastructures")) {

claudio.atzori commented

2020-11-25 17:19:07 +01:00

please clean up unused code

miriam.baglioni commented

2020-11-25 17:52:34 +01:00

done

claudio.atzori reviewed 2020-11-25 17:23:12 +01:00

dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/Constants.java Outdated

						
				@ -26,6 +26,8 @@ public class Constants {

					public static String ORCID = "orcid";

					public static String RESULT_PROJECT_IS_PRODUCED_BY = "isProducedBy";

claudio.atzori commented

2020-11-25 17:23:12 +01:00

Please avoid duplicating the constants. I think you can refer to https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/common/ModelConstants.java#L54

miriam.baglioni commented

2020-11-25 17:57:08 +01:00

ModelConstants does not contain ORCID. I can add it there instead. Ok for IS_PRODUCED_BY

claudio.atzori reviewed 2020-11-25 17:25:49 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/funderresults/oozie_app/workflow.xml Outdated

						
				@ -0,0 +274,4 @@

				                --conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}

				            </spark-opts>

				            <arg>--sourcePath</arg><arg>${workingDir}/result/publication</arg>

				<!--            <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/publication</arg>-->

claudio.atzori commented

2020-11-25 17:25:49 +01:00

cleanup commented definitions, please

miriam.baglioni commented

2020-11-25 17:58:01 +01:00

done

claudio.atzori reviewed 2020-11-25 17:26:07 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/funderresults/oozie_app/workflow.xml Outdated

						
				@ -0,0 +302,4 @@

				                --conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}

				            </spark-opts>

				            <arg>--sourcePath</arg><arg>${workingDir}/result/dataset</arg>

				<!--            <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/dataset</arg>-->

claudio.atzori commented

2020-11-25 17:26:07 +01:00

cleanup comments, please

claudio.atzori reviewed 2020-11-25 17:26:25 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/funderresults/oozie_app/workflow.xml Outdated

						
				@ -0,0 +330,4 @@

				                --conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}

				            </spark-opts>

				            <arg>--sourcePath</arg><arg>${workingDir}/result/otherresearchproduct</arg>

				<!--            <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/otherresearchproduct</arg>-->

claudio.atzori commented

2020-11-25 17:26:25 +01:00

yet another comment to cleanup

claudio.atzori reviewed 2020-11-25 17:26:56 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/funderresults/oozie_app/workflow.xml

						
				@ -0,0 +358,4 @@

				                --conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}

				            </spark-opts>

				            <arg>--sourcePath</arg><arg>${workingDir}/result/software</arg>

				<!--            <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/software</arg>-->

claudio.atzori commented

2020-11-25 17:26:56 +01:00

remove, please!

claudio.atzori reviewed 2020-11-25 17:40:18 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/funderresults/oozie_app/workflow.xml

						
				@ -0,0 +532,4 @@

				            <main-class>eu.dnetlib.dhp.oa.graph.dump.MakeTar</main-class>

				            <arg>--hdfsPath</arg><arg>${outputPath}</arg>

				            <arg>--nameNode</arg><arg>${nameNode}</arg>

				<!--            <arg>&#45;&#45;sourcePath</arg><arg>${workingDir}/resultperfunder</arg>-->

claudio.atzori commented

2020-11-25 17:40:17 +01:00

cleanup, please

claudio.atzori reviewed 2020-11-25 17:41:11 +01:00

dhp-workflows/dhp-graph-mapper/src/test/java/eu/dnetlib/dhp/oa/graph/dump/complete/CreateEntityTest.java Outdated

						
				@ -126,0 +164,4 @@

						final Consumer<ContextInfo> consumer = ci -> cInfoList.add(ci);

						queryInformationSystem.getContextInformation(consumer);

						//List<ResearchInitiative> riList = new ArrayList<>();

claudio.atzori commented

2020-11-25 17:41:11 +01:00

cleanup

miriam.baglioni commented

2020-11-25 17:59:53 +01:00

done

claudio.atzori reviewed 2020-11-25 17:43:11 +01:00

dhp-workflows/dhp-graph-mapper/src/test/java/eu/dnetlib/dhp/oa/graph/dump/complete/CreateEntityTest.java Outdated

						
				@ -126,0 +169,4 @@

							try {

								writer.write(new Gson().toJson(Process.getEntity(cInfo)));

							} catch (IOException e) {

								e.printStackTrace();

claudio.atzori commented

2020-11-25 17:43:11 +01:00

why an exception risen here should not interrupt the execution?

miriam.baglioni commented

2020-11-25 18:02:31 +01:00

because it is in a lambda expression. Anyway we can remove this test. It was just needed to verify that the file was written compressed.

miriam.baglioni commented

2020-11-25 18:13:07 +01:00

I have left the test, and removed the lambda

claudio.atzori reviewed 2020-11-25 17:45:40 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/input_parameters.json Outdated

						
				@ -31,2 +31,3 @@

						"paramRequired": true

					}

					},{

					"paramName":"dt",

claudio.atzori commented

2020-11-25 17:45:40 +01:00

you can probably indent this json record in a more uniform way

claudio.atzori reviewed 2020-11-25 17:45:52 +01:00

dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/input_parameters_link_prj.json Outdated

						
				@ -0,0 +23,4 @@

						"paramDescription": "the name of the result table we are currently working on",

						"paramRequired": true

					},	{

					"paramName":"rp",

claudio.atzori commented

2020-11-25 17:45:52 +01:00

you can probably indent this json record in a more uniform way

claudio.atzori requested changes 2020-11-25 17:48:49 +01:00

claudio.atzori left a comment

Overall, it looks pretty good, just some minor changes on

cleanup of commented code lines and in the workflow definitions
formatting in json confs
an exception eaten in a disabled unit test

Overall, it looks pretty good, just some minor changes on - cleanup of commented code lines and in the workflow definitions - formatting in json confs - an exception eaten in a disabled unit test

miriam.baglioni commented

2020-12-09 17:14:56 +01:00

Requested changes done. There is also another change to check: two classes have been added to common to allow the mapping for the doiBoost result in the public format.

claudio.atzori changed title from ~~WIP: dump of the results related to at least one project~~ to dump of the results related to at least one project

2020-12-09 17:22:49 +01:00

claudio.atzori referenced this issue from a commit

2020-12-09 17:22:57 +01:00

Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master LGTM

claudio.atzori closed this pull request

2020-12-09 17:22:57 +01:00

Sign in to join this conversation.

No reviewers