dump of the results related to at least one project #61

Merged
claudio.atzori merged 51 commits from miriam.baglioni/dnet-hadoop:dump into master 3 years ago
Collaborator

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded.

Main modification:

  • the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality

  • the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}

This PR is related to the dump for results related to funders. It dumps following the model of the ResearchCommunity result dump, the results funded by at least one project. It splits them by funder and associates to each funder the set of results it had funded. Main modification: - the semantics used to extract the link to projects has changed: "isProducedBy" instead of "produces". This because the relationship with relType "outcome" do not verify bidirectionality - the type of dump to be made on the result has changed from boolean to {"complete","community","funder"}
miriam.baglioni added the
enhancement
label 3 years ago
claudio.atzori was assigned by miriam.baglioni 3 years ago
alessia.bardi was assigned by miriam.baglioni 3 years ago
claudio.atzori reviewed 3 years ago
@ -93,3 +93,1 @@
if (name.trim().equalsIgnoreCase("communities_infrastructures")) {
name = "communities_infrastructures.json";
}
// if (name.trim().equalsIgnoreCase("communities_infrastructures")) {
Owner

please clean up unused code

please clean up unused code
Poster
Collaborator

done

done
claudio.atzori reviewed 3 years ago
@ -26,6 +26,8 @@ public class Constants {
public static String ORCID = "orcid";
public static String RESULT_PROJECT_IS_PRODUCED_BY = "isProducedBy";
Owner
Please avoid duplicating the constants. I think you can refer to https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/common/ModelConstants.java#L54
Poster
Collaborator

ModelConstants does not contain ORCID. I can add it there instead. Ok for IS_PRODUCED_BY

ModelConstants does not contain ORCID. I can add it there instead. Ok for IS_PRODUCED_BY
claudio.atzori reviewed 3 years ago
@ -0,0 +274,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/publication</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/publication</arg>-->
Owner

cleanup commented definitions, please

cleanup commented definitions, please
Poster
Collaborator

done

done
claudio.atzori reviewed 3 years ago
@ -0,0 +302,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/dataset</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/dataset</arg>-->
Owner

cleanup comments, please

cleanup comments, please
claudio.atzori reviewed 3 years ago
@ -0,0 +330,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/otherresearchproduct</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/otherresearchproduct</arg>-->
Owner

yet another comment to cleanup

yet another comment to cleanup
claudio.atzori reviewed 3 years ago
@ -0,0 +358,4 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--sourcePath</arg><arg>${workingDir}/result/software</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${sourcePath}/software</arg>-->
Owner

remove, please!

remove, please!
claudio.atzori reviewed 3 years ago
@ -0,0 +532,4 @@
<main-class>eu.dnetlib.dhp.oa.graph.dump.MakeTar</main-class>
<arg>--hdfsPath</arg><arg>${outputPath}</arg>
<arg>--nameNode</arg><arg>${nameNode}</arg>
<!-- <arg>&#45;&#45;sourcePath</arg><arg>${workingDir}/resultperfunder</arg>-->
Owner

cleanup, please

cleanup, please
claudio.atzori reviewed 3 years ago
@ -126,0 +164,4 @@
final Consumer<ContextInfo> consumer = ci -> cInfoList.add(ci);
queryInformationSystem.getContextInformation(consumer);
//List<ResearchInitiative> riList = new ArrayList<>();
Owner

cleanup

cleanup
Poster
Collaborator

done

done
claudio.atzori reviewed 3 years ago
@ -126,0 +169,4 @@
try {
writer.write(new Gson().toJson(Process.getEntity(cInfo)));
} catch (IOException e) {
e.printStackTrace();
Owner

why an exception risen here should not interrupt the execution?

why an exception risen here should not interrupt the execution?
Poster
Collaborator

because it is in a lambda expression. Anyway we can remove this test. It was just needed to verify that the file was written compressed.

because it is in a lambda expression. Anyway we can remove this test. It was just needed to verify that the file was written compressed.
Poster
Collaborator

I have left the test, and removed the lambda

I have left the test, and removed the lambda
claudio.atzori reviewed 3 years ago
@ -31,2 +31,3 @@
"paramRequired": true
}
},{
"paramName":"dt",
Owner

you can probably indent this json record in a more uniform way

you can probably indent this json record in a more uniform way
claudio.atzori reviewed 3 years ago
@ -0,0 +23,4 @@
"paramDescription": "the name of the result table we are currently working on",
"paramRequired": true
}, {
"paramName":"rp",
Owner

you can probably indent this json record in a more uniform way

you can probably indent this json record in a more uniform way
claudio.atzori requested changes 3 years ago
claudio.atzori left a comment
Owner

Overall, it looks pretty good, just some minor changes on

  • cleanup of commented code lines and in the workflow definitions
  • formatting in json confs
  • an exception eaten in a disabled unit test
Overall, it looks pretty good, just some minor changes on - cleanup of commented code lines and in the workflow definitions - formatting in json confs - an exception eaten in a disabled unit test
Poster
Collaborator

Requested changes done. There is also another change to check: two classes have been added to common to allow the mapping for the doiBoost result in the public format.

Requested changes done. There is also another change to check: two classes have been added to common to allow the mapping for the doiBoost result in the public format.
claudio.atzori changed title from WIP: dump of the results related to at least one project to dump of the results related to at least one project 3 years ago
claudio.atzori closed this pull request 3 years ago
The pull request has been merged as ada21ad920.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b miriam.baglioni-dump master
git pull dump

Step 2:

Merge the changes and update on Gitea.
git checkout master
git merge --no-ff miriam.baglioni-dump
git push origin master
Sign in to join this conversation.
No reviewers
No Milestone
No project
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#61
Loading…
There is no content yet.