Compare commits

...

84 Commits

Author SHA1 Message Date
Miriam Baglioni 1c06aae1c6 [EOSC] removed filtering out of organization without name 2024-02-15 14:58:41 +01:00
Miriam Baglioni 95ff4a81dc changes to the properties file 2024-02-13 12:09:37 +01:00
Miriam Baglioni 6f2ec576f8 changed the .gitignore and refactoring 2024-01-08 12:09:30 +01:00
Miriam Baglioni 5165cab63c removed not needed file 2024-01-08 12:07:11 +01:00
Miriam Baglioni 09b03c1c8a Merge pull request 'eoscAddOrganizationProjectsAndRelations' (#7) from eoscAddOrganizationProjectsAndRelations into eoscDump
Reviewed-on: #7
2024-01-08 12:04:47 +01:00
Miriam Baglioni ec30ec0d29 - 2024-01-08 12:04:21 +01:00
Miriam Baglioni 231ed85aa1 - 2024-01-08 11:59:36 +01:00
Miriam Baglioni 9435c6756e refactoring 2024-01-04 12:17:28 +01:00
Miriam Baglioni f6f734bf5e adding use of communityAPIs instead of ISLOOKUPRUL 2024-01-04 12:10:36 +01:00
Miriam Baglioni 1a196d03f9 changed dependency to the dhp-schema model to 4.17.2 2024-01-04 11:26:04 +01:00
Miriam Baglioni 9a06a552c4 fixed last issues 2023-11-17 16:49:06 +01:00
Miriam Baglioni db388ebc21 - 2023-11-16 12:17:42 +01:00
Miriam Baglioni ba713c509f Added filter for possible null results 2023-11-15 10:49:01 +01:00
Miriam Baglioni 0f602bae9d fixed issue. Need to extend wf 2023-11-14 11:33:50 +01:00
Miriam Baglioni fd242c1c87 - 2023-11-13 18:16:53 +01:00
Miriam Baglioni dbfd744f9c fixed issue 2023-11-13 11:30:02 +01:00
Miriam Baglioni d32b0861a2 - 2023-11-10 14:55:09 +01:00
Miriam Baglioni 094d8b996e added test classes 2023-11-07 11:46:31 +01:00
Miriam Baglioni 33eaacdd58 adding test classes 2023-10-27 17:46:00 +02:00
Miriam Baglioni 2fe2f0aa9e removing not needed method 2023-10-27 10:52:23 +02:00
Miriam Baglioni 52a6e2f7ff adding testing classes 2023-10-27 10:51:58 +02:00
Miriam Baglioni 94656b6530 Added parameter file. Fixed issues in path name 2023-10-27 09:29:02 +02:00
Miriam Baglioni 5a3f0d949c - 2023-10-26 08:48:52 +02:00
Miriam Baglioni c946f5c5b8 - 2023-10-25 15:38:46 +02:00
Miriam Baglioni aa48d52707 fixing issues 2023-10-25 15:18:44 +02:00
Miriam Baglioni 8d83b5173c extended the model to accomodate the new entities and relation to be dumped 2023-10-25 11:59:16 +02:00
Miriam Baglioni 25267c1689 extended the workflow to add the dump for the relations 2023-10-25 11:48:42 +02:00
Miriam Baglioni a821371af2 added code to dump the relations between organizaiton and projects in the subset of entities relevant for EOSC 2023-10-25 11:46:10 +02:00
Miriam Baglioni da19f117d8 added parameter to parameter file for the mapping of projects and relations 2023-10-25 11:17:06 +02:00
Miriam Baglioni 7fca920b5f Added extention to dump Projects and also relations of type resultProject 2023-10-25 11:14:46 +02:00
Miriam Baglioni a623883b62 Added extention to dump Organizations and also relations of type resultOrganization 2023-10-24 17:01:03 +02:00
Miriam Baglioni 64f10b6d31 Merge pull request 'Extend the relation inside the result' (#6) from eoscExtendRelation into eoscDump
Reviewed-on: #6
2023-10-24 16:22:07 +02:00
Miriam Baglioni 6f065371b6 first attempt to extend the model adding the result type in the relation 2023-10-24 10:34:15 +02:00
Miriam Baglioni 669e5b645c re implementation 2023-09-27 10:02:52 +02:00
Miriam Baglioni 483214e78f add the information of the eosc datasource id to the instance level 2023-09-26 11:14:45 +02:00
Miriam Baglioni a7866a26f7 first step to add eosc datasource id to instance 2023-09-21 17:44:10 +02:00
Miriam Baglioni 5ddae012c1 fixed the condition to select results to appear in the eosc dump 2023-09-19 10:18:47 +02:00
Miriam Baglioni e9d7271a1e moved maketararchive not to confuse with master 2023-08-04 19:43:32 +02:00
Miriam Baglioni 9ebcdc60bf moved parameter file. moved dump out of working dir not to have to build it from scratch in case of errors in making the tar 2023-08-04 17:23:50 +02:00
Miriam Baglioni df0e152318 moved maketar in common under the dmp 2023-08-04 16:43:55 +02:00
Miriam Baglioni 8f960cc5d9 [eoscDump] updated the schema of the dump adding the fulltext field 2023-07-13 09:48:35 +02:00
Miriam Baglioni e31b2ef6ee [eoscDump] add fulltext at the level of the instance 2023-07-11 11:08:56 +02:00
Miriam Baglioni 831a4611d3 [eoscDump] ack to add not deduplicated datasources + oaffulltext at the level of the result 2023-07-10 18:42:28 +02:00
Miriam Baglioni ca94995e77 - 2023-07-07 18:31:52 +02:00
Miriam Baglioni c875a98ec6 - 2023-07-07 14:06:54 +02:00
Miriam Baglioni a535e926f7 [EOSCDump] added back the part related to B2FIND for the cf otherwise no results from b2fins will be included 2023-07-05 10:02:06 +02:00
Miriam Baglioni b3a4abb12b [EoscDump] changed set of parameters 2023-07-03 14:37:10 +02:00
Miriam Baglioni 000c88dd79 removed the hack to insert B2FIND results in the dump 2023-07-01 15:46:34 +02:00
Miriam Baglioni e9766bd25b make the upload use the okhttp3 version 2023-07-01 12:50:52 +02:00
Miriam Baglioni 6c50524c1e refactoring 2023-07-01 12:47:27 +02:00
Miriam Baglioni d35ffd201d refactoring 2023-07-01 12:32:48 +02:00
Miriam Baglioni d9993ee5a2 added workaround to include B2FIND 2023-06-05 08:14:22 +02:00
Miriam Baglioni ff61dca84a aligned with the last version of pom for production 2023-06-02 15:34:35 +02:00
Miriam Baglioni a25e8071c5 changed to mirror the new Zenodo API interaction 2023-04-19 13:29:27 +02:00
Miriam Baglioni ec1dac5847 remove all not needed classes for the branch. Here only stuff related to the EOSC 2023-02-28 08:55:38 +01:00
Miriam Baglioni da2e0bb1db added class for schema of relations 2023-01-26 18:30:46 +01:00
Miriam Baglioni 64511dad2c adding relations to the dump 2023-01-23 16:08:50 +02:00
Miriam Baglioni d5420960d1 extended code to select the relations between the set of results in eosc. Added also the related step in the workflow 2023-01-12 18:44:37 +01:00
Miriam Baglioni 57a0b96419 [eoscDump] minor 2022-12-27 10:35:35 +01:00
Miriam Baglioni 20409e1106 [EOSC DUMP] added resources needed for the review as test 2022-11-25 17:17:03 +01:00
Miriam Baglioni 59ba4f5d5e [Eosc Dump] added relative path to pom 2022-11-23 12:47:31 +01:00
Miriam Baglioni 3efb8be533 [EOSC Dump] modifying schema and code for eosc if guidelines to accomodate elements with more than one correspondence 2022-11-22 16:42:13 +01:00
Miriam Baglioni 795d4cc1dc [EOSC DUMP ] modified to accomodate the new testing 2022-11-07 18:06:48 +01:00
Miriam Baglioni 326d0ff2da [EOSC DUMP ] added test to verify the addition of the indicators 2022-11-07 18:06:13 +01:00
Miriam Baglioni d2fc543fd8 [EOSC DUMP ] refactoring 2022-11-07 18:05:36 +01:00
Miriam Baglioni ff366dd5b4 [EOSC DUMP ] modified workflow to add the indicators taken from the action set 2022-11-07 18:04:27 +01:00
Miriam Baglioni 5742f63f39 [EOSC DUMP ] refactoring 2022-11-07 18:03:48 +01:00
Miriam Baglioni 849734fb55 [EOSC DUMP ] added param files and removed from existing one not needed params 2022-11-07 18:02:58 +01:00
Miriam Baglioni 74ff2c88d2 [EOSC DUMP ] added testing resources 2022-11-07 18:02:19 +01:00
Miriam Baglioni afc9f966d4 [EOSC DUMP SCHEMA] added description to some of the fields 2022-11-07 18:01:00 +01:00
Miriam Baglioni 1403f6415f [EOSC DUMP SCHEMA] refactoring 2022-11-07 17:59:43 +01:00
Miriam Baglioni e99a879bf0 [EOSC DUMP SCHEMA] new classes for EOSC dump 2022-11-07 17:59:12 +01:00
Miriam Baglioni 6366ed5890 [EOSC DUMP] added modification for the usage stats. To be used when the usagecounts will be inserted in the result 2022-11-07 17:58:39 +01:00
Miriam Baglioni 258a04f68a [EOSC DUMP] added the enum for the eosc dump type 2022-11-07 17:57:46 +01:00
Miriam Baglioni 8947a60c80 [EOSC DUMP] added the class to extend the result with the usage count values. This is temporary until the usage stats are imported in the graph - promotion of the action set in the wrong place 2022-11-07 17:56:57 +01:00
Miriam Baglioni 686f407bcd [EOSC DUMP] removed not needed parameters 2022-11-07 17:55:51 +01:00
Miriam Baglioni 4e57cc22a1 [EOSC Dump] refactoring 2022-10-13 11:57:45 +02:00
Miriam Baglioni 5489ed9834 [EOSC Dump] refactoring 2022-10-04 10:10:59 +02:00
Miriam Baglioni 6dc00dc801 [EOSC Dump] refactoring 2022-09-22 16:58:08 +02:00
Miriam Baglioni 4964c6018a [EOSC Dump] added check to avoi NPE 2022-09-22 16:37:37 +02:00
Miriam Baglioni 01f36ca3ab [EOSC Dump] refactoring 2022-09-21 17:29:50 +02:00
Miriam Baglioni 31fa465f6a [EOSC Dump] added support for the extention of the results by adding the affiliation information 2022-09-21 17:24:37 +02:00
Miriam Baglioni ae10ae9793 [EOSC DUMP] extention to the schema to add the organization affiliated to the result 2022-09-13 12:10:47 +02:00
Miriam Baglioni 608122fe67 [EOSC DUMP] changed to make the dumped result an extention of the CommunityResult. The subjects field has been replaced by the subject field modeled as an hashmap<String,List<Subject>> where Subject is value and provenance 2022-08-11 14:23:55 +02:00
208 changed files with 8220 additions and 10450 deletions

2
.gitignore vendored
View File

@ -26,3 +26,5 @@ spark-warehouse
/**/*.log
/**/.factorypath
/**/.scalafmt.conf
/*/*/job.properties
/**/job.properties

49
api/pom.xml Normal file
View File

@ -0,0 +1,49 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-graph-dump</artifactId>
<version>1.2.5-SNAPSHOT</version>
</parent>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>api</artifactId>
<version>1.2.5-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
</dependency>
<dependency>
<groupId>jaxen</groupId>
<artifactId>jaxen</artifactId>
</dependency>
<dependency>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-common</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<scope>compile</scope>
</dependency>
</dependencies>
</project>

View File

@ -0,0 +1,75 @@
package eu.dnetlib.dhp.communityapi;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
/**
* @author miriam.baglioni
* @Date 06/10/23
*/
public class QueryCommunityAPI {
private static final String PRODUCTION_BASE_URL = "https://services.openaire.eu/openaire/";
private static String get(String geturl) throws IOException {
URL url = new URL(geturl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("GET");
int responseCode = conn.getResponseCode();
String body = getBody(conn);
conn.disconnect();
if (responseCode != HttpURLConnection.HTTP_OK)
throw new IOException("Unexpected code " + responseCode + body);
return body;
}
public static String communities() throws IOException {
return get(PRODUCTION_BASE_URL + "community/communities");
}
public static String community(String id) throws IOException {
return get(PRODUCTION_BASE_URL + "community/" + id);
}
public static String communityDatasource(String id) throws IOException {
return get(PRODUCTION_BASE_URL + "community/" + id + "/contentproviders");
}
public static String communityPropagationOrganization(String id) throws IOException {
return get(PRODUCTION_BASE_URL + "community/" + id + "/propagationOrganizations");
}
public static String communityProjects(String id, String page, String size) throws IOException {
return get(PRODUCTION_BASE_URL + "community/" + id + "/projects/" + page + "/" + size);
}
private static String getBody(HttpURLConnection conn) throws IOException {
String body = "{}";
try (BufferedReader br = new BufferedReader(
new InputStreamReader(conn.getInputStream(), "utf-8"))) {
StringBuilder response = new StringBuilder();
String responseLine = null;
while ((responseLine = br.readLine()) != null) {
response.append(responseLine.trim());
}
body = response.toString();
}
return body;
}
}

View File

@ -0,0 +1,30 @@
package eu.dnetlib.dhp.communityapi.model;
import com.fasterxml.jackson.annotation.JsonAutoDetect;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
@JsonAutoDetect
@JsonIgnoreProperties(ignoreUnknown = true)
public class CommunityContentprovider {
private String openaireId;
private String enabled;
public String getEnabled() {
return enabled;
}
public void setEnabled(String enabled) {
this.enabled = enabled;
}
public String getOpenaireId() {
return openaireId;
}
public void setOpenaireId(final String openaireId) {
this.openaireId = openaireId;
}
}

View File

@ -1,13 +1,13 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
package eu.dnetlib.dhp.communityapi.model;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class OrganizationMap extends HashMap<String, List<String>> {
public class CommunityEntityMap extends HashMap<String, List<String>> {
public OrganizationMap() {
public CommunityEntityMap() {
super();
}

View File

@ -0,0 +1,82 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import java.util.List;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
/**
* @author miriam.baglioni
* @Date 06/10/23
*/
@JsonIgnoreProperties(ignoreUnknown = true)
public class CommunityModel implements Serializable {
private String id;
private String name;
private String description;
private String status;
private String type;
private List<String> subjects;
private String zenodoCommunity;
public List<String> getSubjects() {
return subjects;
}
public void setSubjects(List<String> subjects) {
this.subjects = subjects;
}
public String getZenodoCommunity() {
return zenodoCommunity;
}
public void setZenodoCommunity(String zenodoCommunity) {
this.zenodoCommunity = zenodoCommunity;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getStatus() {
return status;
}
public void setStatus(String status) {
this.status = status;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
}

View File

@ -0,0 +1,15 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import java.util.ArrayList;
/**
* @author miriam.baglioni
* @Date 06/10/23
*/
public class CommunitySummary extends ArrayList<CommunityModel> implements Serializable {
public CommunitySummary() {
super();
}
}

View File

@ -0,0 +1,51 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import java.util.List;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
/**
* @author miriam.baglioni
* @Date 09/10/23
*/
@JsonIgnoreProperties(ignoreUnknown = true)
public class ContentModel implements Serializable {
private List<ProjectModel> content;
private Integer totalPages;
private Boolean last;
private Integer number;
public List<ProjectModel> getContent() {
return content;
}
public void setContent(List<ProjectModel> content) {
this.content = content;
}
public Integer getTotalPages() {
return totalPages;
}
public void setTotalPages(Integer totalPages) {
this.totalPages = totalPages;
}
public Boolean getLast() {
return last;
}
public void setLast(Boolean last) {
this.last = last;
}
public Integer getNumber() {
return number;
}
public void setNumber(Integer number) {
this.number = number;
}
}

View File

@ -0,0 +1,11 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import java.util.ArrayList;
public class DatasourceList extends ArrayList<CommunityContentprovider> implements Serializable {
public DatasourceList() {
super();
}
}

View File

@ -0,0 +1,16 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import java.util.ArrayList;
/**
* @author miriam.baglioni
* @Date 09/10/23
*/
public class OrganizationList extends ArrayList<String> implements Serializable {
public OrganizationList() {
super();
}
}

View File

@ -0,0 +1,44 @@
package eu.dnetlib.dhp.communityapi.model;
import java.io.Serializable;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
/**
* @author miriam.baglioni
* @Date 09/10/23
*/
@JsonIgnoreProperties(ignoreUnknown = true)
public class ProjectModel implements Serializable {
private String openaireId;
private String funder;
private String gratId;
public String getFunder() {
return funder;
}
public void setFunder(String funder) {
this.funder = funder;
}
public String getGratId() {
return gratId;
}
public void setGratId(String gratId) {
this.gratId = gratId;
}
public String getOpenaireId() {
return openaireId;
}
public void setOpenaireId(String openaireId) {
this.openaireId = openaireId;
}
}

View File

@ -6,6 +6,7 @@
<artifactId>dhp-graph-dump</artifactId>
<groupId>eu.dnetlib.dhp</groupId>
<version>1.2.5-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>

View File

@ -12,8 +12,7 @@ import com.fasterxml.jackson.databind.SerializationFeature;
import com.github.imifou.jsonschema.module.addon.AddonModule;
import com.github.victools.jsonschema.generator.*;
import eu.dnetlib.dhp.oa.model.community.CommunityResult;
import eu.dnetlib.dhp.oa.model.graph.*;
import eu.dnetlib.dhp.eosc.model.Result;
public class ExecCreateSchemas {
final static String DIRECTORY = "/eu/dnetlib/dhp/schema/dump/jsonschemas/";
@ -60,15 +59,8 @@ public class ExecCreateSchemas {
ExecCreateSchemas ecs = new ExecCreateSchemas();
ecs.init();
ecs.generate(GraphResult.class, DIRECTORY, "result_schema.json");
ecs.generate(ResearchCommunity.class, DIRECTORY, "community_infrastructure_schema.json");
ecs.generate(Datasource.class, DIRECTORY, "datasource_schema.json");
ecs.generate(Project.class, DIRECTORY, "project_schema.json");
ecs.generate(Relation.class, DIRECTORY, "relation_schema.json");
ecs.generate(Organization.class, DIRECTORY, "organization_schema.json");
ecs.generate(CommunityResult.class, DIRECTORY, "community_result_schema.json");
ecs.generate(Result.class, DIRECTORY, "eosc_result_schema.json");
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
/**
* AccessRight. Used to represent the result access rights. It extends the eu.dnet.lib.dhp.schema.dump.oaf.BestAccessRight

View File

@ -0,0 +1,46 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* @author miriam.baglioni
* @Date 13/09/22
*/
public class Affiliation implements Serializable {
@JsonSchema(description = "the OpenAIRE id of the organizaiton")
private String id;
@JsonSchema(description = "the name of the organization")
private String name;
@JsonSchema(description = "the list of pids we have in OpenAIRE for the organization")
private List<OrganizationPid> pid;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public List<OrganizationPid> getPid() {
return pid;
}
public void setPid(List<OrganizationPid> pid) {
this.pid = pid;
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,8 +1,10 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**

View File

@ -1,8 +1,10 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
@ -36,15 +38,15 @@ public class AuthorPid implements Serializable {
public static AuthorPid newInstance(AuthorPidSchemeValue pid, Provenance provenance) {
AuthorPid p = new AuthorPid();
p.id = pid;
p.provenance = provenance;
p.setId(pid);
p.setProvenance(provenance);
return p;
}
public static AuthorPid newInstance(AuthorPidSchemeValue pid) {
AuthorPid p = new AuthorPid();
p.id = pid;
p.setId(pid);
return p;
}

View File

@ -1,8 +1,10 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public class AuthorPidSchemeValue implements Serializable {
@ -37,4 +39,5 @@ public class AuthorPidSchemeValue implements Serializable {
return cf;
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model.community;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
@ -32,16 +32,15 @@ public class CfHbKeyValue implements Serializable {
this.value = value;
}
public static CfHbKeyValue newInstance(String key, String value) {
CfHbKeyValue inst = new CfHbKeyValue();
inst.key = key;
inst.value = value;
return inst;
}
@JsonIgnore
public boolean isBlank() {
return StringUtils.isBlank(key) && StringUtils.isBlank(value);
}
public static CfHbKeyValue newInstance(String key, String value) {
CfHbKeyValue inst = new CfHbKeyValue();
inst.setKey(key);
inst.setValue(value);
return inst;
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
@ -27,10 +27,13 @@ public class Container implements Serializable {
@JsonSchema(description = "Name of the journal or conference")
private String name;
@JsonSchema(description = "The issn")
private String issnPrinted;
@JsonSchema(description = "The eissn")
private String issnOnline;
@JsonSchema(description = "The lissn")
private String issnLinking;
@JsonSchema(description = "End page")

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model.community;
package eu.dnetlib.dhp.eosc.model;
import java.util.List;
import java.util.Objects;
@ -8,8 +8,6 @@ import java.util.stream.Collectors;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Provenance;
/**
* Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with
* OpenAIRE. It extend eu.dnetlib.dhp.shema.dump.oaf.Qualifier with a parameter provenance of type

View File

@ -1,8 +1,10 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
@ -43,5 +45,4 @@ public class Country implements Serializable {
c.setLabel(label);
return c;
}
}

View File

@ -0,0 +1,67 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* @author miriam.baglioni
* @Date 29/07/22
*/
public class EoscInteroperabilityFramework implements Serializable {
@JsonSchema(description = "EOSC-IF label")
private String label;
@JsonSchema(
description = "EOSC-IF local code. Later on it could be populated with a PID (e.g. DOI), but for the time being we stick to a more loose definition.")
private String code;
@JsonSchema(description = "EOSC-IF url to the guidelines")
private String url;
@JsonSchema(description = "EOSC-IF semantic relation (e.g. compliesWith)")
private String semanticRelation;
public String getLabel() {
return label;
}
public void setLabel(String label) {
this.label = label;
}
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public String getSemanticRelation() {
return semanticRelation;
}
public void setSemanticRelation(String semanticRelation) {
this.semanticRelation = semanticRelation;
}
public static EoscInteroperabilityFramework newInstance(String code, String label, String url,
String semanticRelation) {
EoscInteroperabilityFramework eif = new EoscInteroperabilityFramework();
eif.label = label;
eif.code = code;
eif.url = url;
eif.semanticRelation = semanticRelation;
return eif;
}
}

View File

@ -1,6 +1,14 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
@ -8,7 +16,7 @@ import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
* eu.dnetlib.dhp.schema.dump.oaf.Funder with the following parameter: - - private
* eu.dnetdlib.dhp.schema.dump.oaf.graph.Fundings funding_stream to store the fundingstream
*/
public class Funder extends eu.dnetlib.dhp.oa.model.Funder {
public class Funder extends FunderShort {
@JsonSchema(description = "Description of the funding stream")
private Fundings funding_stream;

View File

@ -1,11 +1,15 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public class Funder implements Serializable {
/**
* @author miriam.baglioni
* @Date 26/01/23
*/
public class FunderShort implements Serializable {
@JsonSchema(description = "The short name of the funder (EC)")
private String shortName;
@ -40,4 +44,15 @@ public class Funder implements Serializable {
public void setName(String name) {
this.name = name;
}
@JsonSchema(description = "Stream of funding (e.g. for European Commission can be H2020 or FP7)")
private String fundingStream;
public String getFundingStream() {
return fundingStream;
}
public void setFundingStream(String fundingStream) {
this.fundingStream = fundingStream;
}
}

View File

@ -1,6 +1,14 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,6 +1,14 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;

View File

@ -0,0 +1,33 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
/**
* @author miriam.baglioni
* @Date 04/11/22
*/
public class Indicator implements Serializable {
private UsageCounts usageCounts;
public UsageCounts getUsageCounts() {
return usageCounts;
}
public void setUsageCounts(UsageCounts usageCounts) {
this.usageCounts = usageCounts;
}
public static Indicator newInstance(UsageCounts uc) {
Indicator i = new Indicator();
i.setUsageCounts(uc);
return i;
}
public static Indicator newInstance(String downloads, String views) {
Indicator i = new Indicator();
i.setUsageCounts(UsageCounts.newInstance(views, downloads));
return i;
}
}

View File

@ -1,30 +1,35 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import java.util.List;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* Represents the manifestations (i.e. different versions) of the result. For example: the pre-print and the published
* versions are two manifestations of the same research result. It has the following parameters: - license of type
* String to store the license applied to the instance. It corresponds to the value of the licence in the instance to be
* dumped - accessright of type eu.dnetlib.dhp.schema.dump.oaf.AccessRight to store the accessright of the instance. -
* type of type String to store the type of the instance as defined in the corresponding dnet vocabulary
* (dnet:pubication_resource). It corresponds to the instancetype.classname of the instance to be mapped - url of type
* List<String> list of locations where the instance is accessible. It corresponds to url of the instance to be dumped -
* publicationdate of type String to store the publication date of the instance ;// dateofacceptance; - refereed of type
* String to store information abour the review status of the instance. Possible values are 'Unknown',
* 'nonPeerReviewed', 'peerReviewed'. It corresponds to refereed.classname of the instance to be dumped
* - articleprocessingcharge of type APC to store the article processing charges possibly associated to the instance
* -pid of type List<ControlledField> that is the list of pids associated to the result coming from authoritative sources for that pid
* -alternateIdentifier of type List<ControlledField> that is the list of pids associated to the result coming from NON authoritative
* sources for that pid
* -measure list<KeyValue> to represent the measure computed for this instance (for example the Bip!Finder ones). It corresponds to measures in the model
* @author miriam.baglioni
* @Date 02/02/23
*/
/**
* It extends eu.dnetlib.dhp.dump.oaf.Instance with values related to the community dump. In the Result dump this
* information is not present because it is dumped as a set of relations between the result and the datasource. -
* hostedby of type eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the
* instance can be viewed or downloaded. It is mapped against the hostedby parameter of the instance to be dumped and -
* key corresponds to hostedby.key - value corresponds to hostedby.value - collectedfrom of type
* eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the instance has been
* collected. It is mapped against the collectedfrom parameter of the instance to be dumped and - key corresponds to
* collectedfrom.key - value corresponds to collectedfrom.value
*/
public class Instance implements Serializable {
@JsonSchema(description = "Information about the source from which the instance can be viewed or downloaded.")
private CfHbKeyValue hostedby;
@JsonSchema(description = "Information about the source from which the record has been collected")
@JsonInclude(JsonInclude.Include.NON_NULL)
private CfHbKeyValue collectedfrom;
@JsonSchema(description = "Measures computed for this instance, for example Bip!Finder ones")
private List<Measure> measures;
@ -59,6 +64,26 @@ public class Instance implements Serializable {
"nonPeerReviewed, UNKNOWN (as defined in https://api.openaire.eu/vocabularies/dnet:review_levels)")
private String refereed; // peer-review status
private String fulltext;
private List<String> eoscDsId;
public List<String> getEoscDsId() {
return eoscDsId;
}
public void setEoscDsId(List<String> eoscId) {
this.eoscDsId = eoscId;
}
public String getFulltext() {
return fulltext;
}
public void setFulltext(String fulltext) {
this.fulltext = fulltext;
}
public String getLicense() {
return license;
}
@ -138,4 +163,20 @@ public class Instance implements Serializable {
public void setMeasures(List<Measure> measures) {
this.measures = measures;
}
public CfHbKeyValue getHostedby() {
return hostedby;
}
public void setHostedby(CfHbKeyValue hostedby) {
this.hostedby = hostedby;
}
public CfHbKeyValue getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(CfHbKeyValue collectedfrom) {
this.collectedfrom = collectedfrom;
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
/**
* This Enum models the OpenAccess status, currently including only the values from Unpaywall

View File

@ -1,13 +1,11 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Country;
/**
* To represent the generic organizaiton. It has the following parameters:
* - private String legalshortname to store the legalshortname of the organizaiton

View File

@ -0,0 +1,43 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* @author miriam.baglioni
* @Date 13/09/22
*/
public class OrganizationPid implements Serializable {
@JsonSchema(description = "the type of the organization pid")
private String type;
@JsonSchema(description = "the value of the organization pid")
private String value;
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static OrganizationPid newInstance(String type, String value) {
OrganizationPid op = new OrganizationPid();
op.type = type;
op.value = value;
return op;
}
}

View File

@ -1,6 +1,14 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;

View File

@ -1,6 +1,14 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
/**
* @author miriam.baglioni
* @Date 25/10/23
*/
import java.io.Serializable;
import java.util.List;

View File

@ -0,0 +1,97 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* @author miriam.baglioni
* @Date 26/01/23
*/
public class ProjectSummary implements Serializable {
@JsonSchema(description = "The OpenAIRE id for the project")
protected String id;// OpenAIRE id
@JsonSchema(description = "The grant agreement number")
protected String code;
@JsonSchema(description = "The acronym of the project")
protected String acronym;
protected String title;
@JsonSchema(description = "Information about the funder funding the project")
private FunderShort funder;
private Provenance provenance;
private Validated validated;
public void setValidated(Validated validated) {
this.validated = validated;
}
public Validated getValidated() {
return validated;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public FunderShort getFunder() {
return funder;
}
public void setFunder(FunderShort funders) {
this.funder = funders;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getAcronym() {
return acronym;
}
public void setAcronym(String acronym) {
this.acronym = acronym;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public static ProjectSummary newInstance(String id, String code, String acronym, String title, FunderShort funder) {
ProjectSummary project = new ProjectSummary();
project.setAcronym(acronym);
project.setCode(code);
project.setFunder(funder);
project.setId(id);
project.setTitle(title);
return project;
}
}

View File

@ -1,12 +1,13 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
/**
* Indicates the process that produced (or provided) the information, and the trust associated to the information. It
* has two parameters: - provenance of type String to store the provenance of the information, - trust of type String to
* store the trust associated to the information
* @author miriam.baglioni
* @Date 26/01/23
*/
public class Provenance implements Serializable {
private String provenance;
@ -30,12 +31,13 @@ public class Provenance implements Serializable {
public static Provenance newInstance(String provenance, String trust) {
Provenance p = new Provenance();
p.provenance = provenance;
p.trust = trust;
p.setProvenance(provenance);
p.setTrust(trust);
return p;
}
public String toString() {
return provenance + trust;
}
// public String toStringProvenance(Provenance p) {
// return p.getProvenance() + p.getTrust();
// }
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -1,13 +1,12 @@
package eu.dnetlib.dhp.oa.model.graph;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import java.util.Objects;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Provenance;
/**
* To represent the gereric relation between two entities. It has the following parameters: - private Node source to
* represent the entity source of the relation - private Node target to represent the entity target of the relation -
@ -16,37 +15,44 @@ import eu.dnetlib.dhp.oa.model.Provenance;
*/
public class Relation implements Serializable {
@JsonSchema(description = "The node source in the relation")
private Node source;
private String source;
@JsonSchema(description = "The node target in the relation")
private Node target;
private String target;
@JsonSchema(description = "To represent the semantics of a relation between two entities")
@JsonIgnoreProperties(ignoreUnknown = true)
private RelType reltype;
@JsonSchema(description = "The reason why OpenAIRE holds the relation ")
@JsonIgnoreProperties(ignoreUnknown = true)
private Provenance provenance;
@JsonSchema(
description = "True if the relation is related to a project and it has been collected from an authoritative source (i.e. the funder)")
private boolean validated;
@JsonSchema(description = "The result type of the target for this relation")
@JsonIgnoreProperties(ignoreUnknown = true)
private String targetType;
@JsonSchema(description = "The date when the relation was collected from OpenAIRE")
private String validationDate;
public String getTargetType() {
return targetType;
}
public Node getSource() {
public void setTargetType(String targetType) {
this.targetType = targetType;
}
public String getSource() {
return source;
}
public void setSource(Node source) {
public void setSource(String source) {
this.source = source;
}
public Node getTarget() {
public String getTarget() {
return target;
}
public void setTarget(Node target) {
public void setTarget(String target) {
this.target = target;
}
@ -66,29 +72,13 @@ public class Relation implements Serializable {
this.provenance = provenance;
}
public void setValidated(boolean validate) {
this.validated = validate;
}
public boolean getValidated() {
return validated;
}
public void setValidationDate(String validationDate) {
this.validationDate = validationDate;
}
public String getValidationDate() {
return validationDate;
}
@Override
public int hashCode() {
return Objects.hash(source.getId(), target.getId(), reltype.getType() + ":" + reltype.getName());
return Objects.hash(source, target, reltype.getType() + ":" + reltype.getName());
}
public static Relation newInstance(Node source, Node target, RelType reltype, Provenance provenance) {
public static Relation newInstance(String source, String target, RelType reltype, Provenance provenance) {
Relation relation = new Relation();
relation.source = source;
relation.target = target;
@ -96,4 +86,12 @@ public class Relation implements Serializable {
relation.provenance = provenance;
return relation;
}
public static Relation newInstance(String source, String target) {
Relation relation = new Relation();
relation.source = source;
relation.target = target;
return relation;
}
}

View File

@ -1,75 +1,49 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import java.util.List;
import java.util.Map;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* To represent the dumped result. It will be extended in the dump for Research Communities - Research
* Initiative/Infrastructures. It has the following parameters:
* - author of type
* List<eu.dnetlib.dhpschema.dump.oaf.Author> to describe the authors of a result. For each author in the result
* represented in the internal model one author in the esternal model is produced.
* - type of type String to represent
* the category of the result. Possible values are publication, dataset, software, other. It corresponds to
* resulttype.classname of the dumped result
* - language of type eu.dnetlib.dhp.schema.dump.oaf.Language to store
* information about the language of the result. It is dumped as - code corresponds to language.classid - value
* corresponds to language.classname
* - country of type List<eu.dnetlib.dhp.schema.dump.oaf.Country> to store the country
* list to which the result is associated. For each country in the result respresented in the internal model one country
* in the external model is produces - subjects of type List<eu.dnetlib.dhp.dump.oaf.Subject> to store the subjects for
* the result. For each subject in the result represented in the internal model one subject in the external model is
* produced - maintitle of type String to store the main title of the result. It corresponds to the value of the first
* title in the resul to be dumped having classid equals to "main title" - subtitle of type String to store the subtitle
* of the result. It corresponds to the value of the first title in the resul to be dumped having classid equals to
* "subtitle" - description of type List<String> to store the description of the result. It corresponds to the list of
* description.value in the result represented in the internal model - publicationdate of type String to store the
* pubblication date. It corresponds to dateofacceptance.value in the result represented in the internal model -
* publisher of type String to store information about the publisher. It corresponds to publisher.value of the result
* represented in the intrenal model - embargoenddate of type String to store the embargo end date. It corresponds to
* embargoenddate.value of the result represented in the internal model - source of type List<String> See definition of
* Dublin Core field dc:source. It corresponds to the list of source.value in the result represented in the internal
* model - format of type List<String> It corresponds to the list of format.value in the result represented in the
* internal model - contributor of type List<String> to represent contributors for this result. It corresponds to the
* list of contributor.value in the result represented in the internal model - coverage of type String. It corresponds
* to the list of coverage.value in the result represented in the internal model - bestaccessright of type
* eu.dnetlib.dhp.schema.dump.oaf.AccessRight to store informatin about the openest access right associated to the
* manifestations of this research results. It corresponds to the same parameter in the result represented in the
* internal model - container of type eu.dnetlib.dhp.schema/dump.oaf.Container (only for result of type publication). It
* corresponds to the parameter journal of the result represented in the internal model - documentationUrl of type
* List<String> (only for results of type software) to store the URLs to the software documentation. It corresponds to
* the list of documentationUrl.value of the result represented in the internal model - codeRepositoryUrl of type String
* (only for results of type software) to store the URL to the repository with the source code. It corresponds to
* codeRepositoryUrl.value of the result represented in the internal model - programmingLanguage of type String (only
* for results of type software) to store the programming language. It corresponds to programmingLanguaga.classid of the
* result represented in the internal model - contactperson of type List<String> (only for results of type other) to
* store the contact person for this result. It corresponds to the list of contactperson.value of the result represented
* in the internal model - contactgroup of type List<String> (only for results of type other) to store the information
* for the contact group. It corresponds to the list of contactgroup.value of the result represented in the internal
* model - tool of type List<String> (only fro results of type other) to store information about tool useful for the
* interpretation and/or re-used of the research product. It corresponds to the list of tool.value in the result
* represented in the internal modelt - size of type String (only for results of type dataset) to store the size of the
* dataset. It corresponds to size.value in the result represented in the internal model - version of type String (only
* for results of type dataset) to store the version. It corresponds to version.value of the result represented in the
* internal model - geolocation fo type List<eu.dnetlib.dhp.schema.dump.oaf.GeoLocation> (only for results of type
* dataset) to store geolocation information. For each geolocation element in the result represented in the internal
* model a GeoLocation in the external model il produced - id of type String to store the OpenAIRE id of the result. It
* corresponds to the id of the result represented in the internal model - originalId of type List<String> to store the
* original ids of the result. It corresponds to the originalId of the result represented in the internal model - pid of
* type List<eu.dnetlib.dhp.schema.dump.oaf.ControlledField> to store the persistent identifiers for the result. For
* each pid in the results represented in the internal model one pid in the external model is produced. The value
* correspondence is: - scheme corresponds to pid.qualifier.classid of the result represented in the internal model -
* value corresponds to the pid.value of the result represented in the internal model - dateofcollection of type String
* to store information about the time OpenAIRE collected the record. It corresponds to dateofcollection of the result
* represented in the internal model - lasteupdatetimestamp of type String to store the timestamp of the last update of
* the record. It corresponds to lastupdatetimestamp of the resord represented in the internal model
*
* @author miriam.baglioni
* @Date 29/07/22
*/
public class Result implements Serializable {
@JsonSchema(description = "Describes a reference to the EOSC Interoperability Framework (IF) Guidelines")
private List<EoscInteroperabilityFramework> eoscIF;
@JsonSchema(description = "The subject dumped by type associated to the result")
private Map<String, List<Subject>> subject;
@JsonSchema(description = "The list of keywords associated to the result")
private List<String> keywords;
@JsonSchema(description = "The list of organizations the result is affiliated to")
private List<Affiliation> affiliation;
@JsonSchema(description = "The indicators for this result")
private Indicator indicator;
@JsonSchema(description = "List of projects (i.e. grants) that (co-)funded the production ofn the research results")
private List<ProjectSummary> projects;
@JsonSchema(
description = "Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu")
private List<Context> context;
@JsonSchema(description = "Information about the sources from which the result has been collected")
@JsonInclude(JsonInclude.Include.NON_NULL)
protected List<CfHbKeyValue> collectedfrom;
@JsonSchema(
description = "Each instance is one specific materialisation or version of the result. For example, you can have one result with three instance: one is the pre-print, one is the post-print, one is te published version")
private List<Instance> instance;
private List<Author> author;
// resulttype allows subclassing results into publications | datasets | software
@ -83,9 +57,6 @@ public class Result implements Serializable {
@JsonSchema(description = "The list of countries associated to this result")
private List<ResultCountry> country;
@JsonSchema(description = "Keywords associated to the result")
private List<Subject> subjects;
@JsonSchema(
description = "A name or title by which a scientific result is known. May be the title of a publication, of a dataset or the name of a piece of software.")
private String maintitle;
@ -168,6 +139,20 @@ public class Result implements Serializable {
@JsonSchema(description = "Timestamp of last update of the record in OpenAIRE")
private Long lastupdatetimestamp;
@JsonSchema(description = "The set of relations associated to this result")
private List<Relation> relations;
@JsonSchema(description = "The direct link to the full-text as collected from the data source")
private List<String> fulltext;
public List<String> getFulltext() {
return fulltext;
}
public void setFulltext(List<String> fulltext) {
this.fulltext = fulltext;
}
public Long getLastupdatetimestamp() {
return lastupdatetimestamp;
}
@ -248,14 +233,6 @@ public class Result implements Serializable {
this.country = country;
}
public List<Subject> getSubjects() {
return subjects;
}
public void setSubjects(List<Subject> subjects) {
this.subjects = subjects;
}
public String getMaintitle() {
return maintitle;
}
@ -416,4 +393,83 @@ public class Result implements Serializable {
this.geolocation = geolocation;
}
public List<Instance> getInstance() {
return instance;
}
public void setInstance(List<Instance> instance) {
this.instance = instance;
}
public List<CfHbKeyValue> getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(List<CfHbKeyValue> collectedfrom) {
this.collectedfrom = collectedfrom;
}
public List<ProjectSummary> getProjects() {
return projects;
}
public void setProjects(List<ProjectSummary> projects) {
this.projects = projects;
}
public List<Context> getContext() {
return context;
}
public void setContext(List<Context> context) {
this.context = context;
}
public List<Relation> getRelations() {
return relations;
}
public void setRelations(List<Relation> relations) {
this.relations = relations;
}
public Indicator getIndicator() {
return indicator;
}
public void setIndicator(Indicator indicator) {
this.indicator = indicator;
}
public List<String> getKeywords() {
return keywords;
}
public void setKeywords(List<String> keywords) {
this.keywords = keywords;
}
public List<EoscInteroperabilityFramework> getEoscIF() {
return eoscIF;
}
public void setEoscIF(List<EoscInteroperabilityFramework> eoscIF) {
this.eoscIF = eoscIF;
}
public Map<String, List<Subject>> getSubject() {
return subject;
}
public void setSubject(Map<String, List<Subject>> subject) {
this.subject = subject;
}
public List<Affiliation> getAffiliation() {
return affiliation;
}
public void setAffiliation(List<Affiliation> affiliation) {
this.affiliation = affiliation;
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
@ -38,4 +38,5 @@ public class ResultCountry extends Country {
public static ResultCountry newInstance(String code, String label, String provenance, String trust) {
return newInstance(code, label, Provenance.newInstance(provenance, trust));
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.model;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;

View File

@ -0,0 +1,34 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* @author miriam.baglioni
* @Date 10/08/22
*/
public class Subject implements Serializable {
@JsonSchema(description = "Why this subject is associated to the result")
private Provenance provenance;
@JsonSchema(description = "The subject value")
private String value;
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
}

View File

@ -0,0 +1,43 @@
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
/**
* @author miriam.baglioni
* @Date 04/11/22
*/
public class UsageCounts implements Serializable {
private String views;
private String downloads;
public String getViews() {
return views;
}
public void setViews(String views) {
this.views = views;
}
public String getDownloads() {
return downloads;
}
public void setDownloads(String downloads) {
this.downloads = downloads;
}
public static UsageCounts newInstance(String views, String downloads) {
UsageCounts uc = new UsageCounts();
uc.setViews(views);
uc.setDownloads(downloads);
return uc;
}
public boolean isEmpty() {
return StringUtils.isEmpty(this.downloads) || StringUtils.isEmpty(this.views);
}
}

View File

@ -1,13 +1,13 @@
package eu.dnetlib.dhp.oa.model.community;
package eu.dnetlib.dhp.eosc.model;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
/**
* To store information about the funder funding the project related to the result. It has the following parameters: -
* shortName of type String to store the funder short name (e.c. AKA). - name of type String to store the funder name
* (e.c. Akademy of Finland) - fundingStream of type String to store the funding stream - jurisdiction of type String to
* store the jurisdiction of the funder
* @author miriam.baglioni
* @Date 26/01/23
*/
public class Validated implements Serializable {
@ -32,8 +32,9 @@ public class Validated implements Serializable {
public static Validated newInstance(Boolean validated, String validationDate) {
Validated v = new Validated();
v.validatedByFunder = validated;
v.validationDate = validationDate;
v.setValidatedByFunder(validated);
v.setValidationDate(validationDate);
return v;
}
}

View File

@ -1,57 +0,0 @@
package eu.dnetlib.dhp.oa.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* This class to store the common information about the project that will be dumped for community and for the whole
* graph - private String id to store the id of the project (OpenAIRE id) - private String code to store the grant
* agreement of the project - private String acronym to store the acronym of the project - private String title to store
* the tile of the project
*/
public class Project implements Serializable {
@JsonSchema(description = "The OpenAIRE id for the project")
protected String id;// OpenAIRE id
@JsonSchema(description = "The grant agreement number")
protected String code;
@JsonSchema(description = "The acronym of the project")
protected String acronym;
protected String title;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getAcronym() {
return acronym;
}
public void setAcronym(String acronym) {
this.acronym = acronym;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
}

View File

@ -1,40 +0,0 @@
package eu.dnetlib.dhp.oa.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* To represent keywords associated to the result. It has two parameters:
* - subject of type eu.dnetlib.dhp.schema.dump.oaf.SubjectSchemeValue to describe the subject. It mapped as:
* - schema it corresponds to qualifier.classid of the dumped subject
* - value it corresponds to the subject value
* - provenance of type eu.dnetlib.dhp.schema.dump.oaf.Provenance to represent the provenance of the subject. It is dumped only if dataInfo
* is not null. In this case:
* - provenance corresponds to dataInfo.provenanceaction.classname
* - trust corresponds to dataInfo.trust
*/
public class Subject implements Serializable {
private SubjectSchemeValue subject;
@JsonSchema(description = "Why this subject is associated to the result")
private Provenance provenance;
public SubjectSchemeValue getSubject() {
return subject;
}
public void setSubject(SubjectSchemeValue subject) {
this.subject = subject;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
}

View File

@ -1,42 +0,0 @@
package eu.dnetlib.dhp.oa.model;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public class SubjectSchemeValue implements Serializable {
@JsonSchema(
description = "OpenAIRE subject classification scheme (https://api.openaire.eu/vocabularies/dnet:subject_classification_typologies).")
private String scheme;
@JsonSchema(
description = "The value for the subject in the selected scheme. When the scheme is 'keyword', it means that the subject is free-text (i.e. not a term from a controlled vocabulary).")
private String value;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static SubjectSchemeValue newInstance(String scheme, String value) {
SubjectSchemeValue cf = new SubjectSchemeValue();
cf.setScheme(scheme);
cf.setValue(value);
return cf;
}
}

View File

@ -1,40 +0,0 @@
package eu.dnetlib.dhp.oa.model.community;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Instance;
/**
* It extends eu.dnetlib.dhp.dump.oaf.Instance with values related to the community dump. In the Result dump this
* information is not present because it is dumped as a set of relations between the result and the datasource. -
* hostedby of type eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the
* instance can be viewed or downloaded. It is mapped against the hostedby parameter of the instance to be dumped and -
* key corresponds to hostedby.key - value corresponds to hostedby.value - collectedfrom of type
* eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the instance has been
* collected. It is mapped against the collectedfrom parameter of the instance to be dumped and - key corresponds to
* collectedfrom.key - value corresponds to collectedfrom.value
*/
public class CommunityInstance extends Instance {
@JsonSchema(description = "Information about the source from which the instance can be viewed or downloaded.")
private CfHbKeyValue hostedby;
@JsonSchema(description = "Information about the source from which the record has been collected")
private CfHbKeyValue collectedfrom;
public CfHbKeyValue getHostedby() {
return hostedby;
}
public void setHostedby(CfHbKeyValue hostedby) {
this.hostedby = hostedby;
}
public CfHbKeyValue getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(CfHbKeyValue collectedfrom) {
this.collectedfrom = collectedfrom;
}
}

View File

@ -1,70 +0,0 @@
package eu.dnetlib.dhp.oa.model.community;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Result;
/**
* extends eu.dnetlib.dhp.schema.dump.oaf.Result with the following parameters: - projects of type
* List<eu.dnetlib.dhp.schema.dump.oaf.community.Project> to store the list of projects related to the result. The
* information is added after the result is mapped to the external model - context of type
* List<eu.dnetlib.dhp.schema.dump.oaf.community.Context> to store information about the RC RI related to the result.
* For each context in the result represented in the internal model one context in the external model is produced -
* collectedfrom of type List<eu.dnetliv.dhp.schema.dump.oaf.KeyValue> to store information about the sources from which
* the record has been collected. For each collectedfrom in the result represented in the internal model one
* collectedfrom in the external model is produced - instance of type
* List<eu.dnetlib.dhp.schema.dump.oaf.community.CommunityInstance> to store all the instances associated to the result.
* It corresponds to the same parameter in the result represented in the internal model
*/
public class CommunityResult extends Result {
@JsonSchema(description = "List of projects (i.e. grants) that (co-)funded the production ofn the research results")
private List<Project> projects;
@JsonSchema(
description = "Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu")
private List<Context> context;
@JsonSchema(description = "Information about the sources from which the record has been collected")
protected List<CfHbKeyValue> collectedfrom;
@JsonSchema(
description = "Each instance is one specific materialisation or version of the result. For example, you can have one result with three instance: one is the pre-print, one is the post-print, one is te published version")
private List<CommunityInstance> instance;
public List<CommunityInstance> getInstance() {
return instance;
}
public void setInstance(List<CommunityInstance> instance) {
this.instance = instance;
}
public List<CfHbKeyValue> getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(List<CfHbKeyValue> collectedfrom) {
this.collectedfrom = collectedfrom;
}
public List<Project> getProjects() {
return projects;
}
public void setProjects(List<Project> projects) {
this.projects = projects;
}
public List<Context> getContext() {
return context;
}
public void setContext(List<Context> context) {
this.context = context;
}
}

View File

@ -1,24 +0,0 @@
package eu.dnetlib.dhp.oa.model.community;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* To store information about the funder funding the project related to the result. It has the following parameters: -
* shortName of type String to store the funder short name (e.c. AKA). - name of type String to store the funder name
* (e.c. Akademy of Finland) - fundingStream of type String to store the funding stream - jurisdiction of type String to
* store the jurisdiction of the funder
*/
public class Funder extends eu.dnetlib.dhp.oa.model.Funder {
@JsonSchema(description = "Stream of funding (e.g. for European Commission can be H2020 or FP7)")
private String fundingStream;
public String getFundingStream() {
return fundingStream;
}
public void setFundingStream(String fundingStream) {
this.fundingStream = fundingStream;
}
}

View File

@ -1,58 +0,0 @@
package eu.dnetlib.dhp.oa.model.community;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Provenance;
/**
* To store information about the project related to the result. This information is not directly mapped from the result
* represented in the internal model because it is not there. The mapped result will be enriched with project
* information derived by relation between results and projects. Project extends eu.dnetlib.dhp.schema.dump.oaf.Project
* with the following parameters: - funder of type eu.dnetlib.dhp.schema.dump.oaf.community.Funder to store information
* about the funder funding the project - provenance of type eu.dnetlib.dhp.schema.dump.oaf.Provenance to store
* information about the. provenance of the association between the result and the project
*/
public class Project extends eu.dnetlib.dhp.oa.model.Project {
@JsonSchema(description = "Information about the funder funding the project")
private Funder funder;
private Provenance provenance;
private Validated validated;
public void setValidated(Validated validated) {
this.validated = validated;
}
public Validated getValidated() {
return validated;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public Funder getFunder() {
return funder;
}
public void setFunder(Funder funders) {
this.funder = funders;
}
public static Project newInstance(String id, String code, String acronym, String title, Funder funder) {
Project project = new Project();
project.setAcronym(acronym);
project.setCode(code);
project.setFunder(funder);
project.setId(id);
project.setTitle(title);
return project;
}
}

View File

@ -1,21 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
public class Constants implements Serializable {
// collectedFrom va con isProvidedBy -> becco da ModelSupport
public static final String HOSTED_BY = "isHostedBy";
public static final String HOSTS = "hosts";
// community result uso isrelatedto
public static final String RESULT_ENTITY = "result";
public static final String DATASOURCE_ENTITY = "datasource";
public static final String CONTEXT_ENTITY = "context";
public static final String CONTEXT_ID = "60";
public static final String CONTEXT_NS_PREFIX = "context____";
}

View File

@ -1,346 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Container;
/**
* To store information about the datasource OpenAIRE collects information from. It contains the following parameters: -
* id of type String to store the OpenAIRE id for the datasource. It corresponds to the parameter id of the datasource
* represented in the internal model - originalId of type List<String> to store the list of original ids associated to
* the datasource. It corresponds to the parameter originalId of the datasource represented in the internal model. The
* null values are filtered out - pid of type List<eu.dnetlib.shp.schema.dump.oaf.ControlledField> to store the
* persistent identifiers for the datasource. For each pid in the datasource represented in the internal model one pid
* in the external model is produced as : - schema corresponds to pid.qualifier.classid of the datasource represented in
* the internal model - value corresponds to pid.value of the datasource represented in the internal model -
* datasourceType of type eu.dnetlib.dhp.schema.dump.oaf.ControlledField to store the datasource type (e.g.
* pubsrepository::institutional, Institutional Repository) as in the dnet vocabulary dnet:datasource_typologies. It
* corresponds to datasourcetype of the datasource represented in the internal model and : - code corresponds to
* datasourcetype.classid - value corresponds to datasourcetype.classname - openairecompatibility of type String to
* store information about the OpenAIRE compatibility of the ingested results (which guidelines they are compliant to).
* It corresponds to openairecompatibility.classname of the datasource represented in the internal model - officialname
* of type Sgtring to store the official name of the datasource. It correspond to officialname.value of the datasource
* represented in the internal model - englishname of type String to store the English name of the datasource. It
* corresponds to englishname.value of the datasource represented in the internal model - websiteurl of type String to
* store the URL of the website of the datasource. It corresponds to websiteurl.value of the datasource represented in
* the internal model - logourl of type String to store the URL of the logo for the datasource. It corresponds to
* logourl.value of the datasource represented in the internal model - dateofvalidation of type String to store the data
* of validation against the guidelines for the datasource records. It corresponds to dateofvalidation.value of the
* datasource represented in the internal model - description of type String to store the description for the
* datasource. It corresponds to description.value of the datasource represented in the internal model
*/
public class Datasource implements Serializable {
@JsonSchema(description = "The OpenAIRE id of the data source")
private String id; // string
@JsonSchema(description = "Original identifiers for the datasource")
private List<String> originalId; // list string
@JsonSchema(description = "Persistent identifiers of the datasource")
private List<DatasourcePid> pid; // list<String>
@JsonSchema(
description = "The type of the datasource. See https://api.openaire.eu/vocabularies/dnet:datasource_typologies")
private DatasourceSchemeValue datasourcetype; // value
@JsonSchema(
description = "OpenAIRE guidelines the data source comply with. See also https://guidelines.openaire.eu.")
private String openairecompatibility; // value
@JsonSchema(description = "The official name of the datasource")
private String officialname; // string
@JsonSchema(description = "The English name of the datasource")
private String englishname; // string
private String websiteurl; // string
private String logourl; // string
@JsonSchema(description = "The date of last validation against the OpenAIRE guidelines for the datasource records")
private String dateofvalidation; // string
private String description; // description
@JsonSchema(description = "List of subjects associated to the datasource")
private List<String> subjects; // List<String>
// opendoar specific fields (od*)
@JsonSchema(description = "The languages present in the data source's content, as defined by OpenDOAR.")
private List<String> languages; // odlanguages List<String>
@JsonSchema(description = "Types of content in the data source, as defined by OpenDOAR")
private List<String> contenttypes; // odcontent types List<String>
// re3data fields
@JsonSchema(description = "Releasing date of the data source, as defined by re3data.org")
private String releasestartdate; // string
@JsonSchema(
description = "Date when the data source went offline or stopped ingesting new research data. As defined by re3data.org")
private String releaseenddate; // string
@JsonSchema(
description = "The URL of a mission statement describing the designated community of the data source. As defined by re3data.org")
private String missionstatementurl; // string
@JsonSchema(
description = "Type of access to the data source, as defined by re3data.org. Possible values: " +
"{open, restricted, closed}")
private String accessrights; // databaseaccesstype string
// {open, restricted or closed}
@JsonSchema(description = "Type of data upload. As defined by re3data.org: one of {open, restricted,closed}")
private String uploadrights; // datauploadtype string
@JsonSchema(
description = "Access restrinctions to the data source, as defined by re3data.org. One of {feeRequired, registration, other}")
private String databaseaccessrestriction; // string
@JsonSchema(
description = "Upload restrictions applied by the datasource, as defined by re3data.org. One of {feeRequired, registration, other}")
private String datauploadrestriction; // string
@JsonSchema(description = "As defined by redata.org: 'yes' if the data source supports versioning, 'no' otherwise.")
private Boolean versioning; // boolean
@JsonSchema(
description = "The URL of the data source providing information on how to cite its items. As defined by re3data.org.")
private String citationguidelineurl; // string
// {yes, no, uknown}
@JsonSchema(
description = "The persistent identifier system that is used by the data source. As defined by re3data.org")
private String pidsystems; // string
@JsonSchema(
description = "The certificate, seal or standard the data source complies with. As defined by re3data.org.")
private String certificates; // string
@JsonSchema(description = "Policies of the data source, as defined in OpenDOAR.")
private List<String> policies; //
@JsonSchema(description = "Information about the journal, if this data source is of type Journal.")
private Container journal; // issn etc del Journal
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public List<String> getOriginalId() {
return originalId;
}
public void setOriginalId(List<String> originalId) {
this.originalId = originalId;
}
public List<DatasourcePid> getPid() {
return pid;
}
public void setPid(List<DatasourcePid> pid) {
this.pid = pid;
}
public DatasourceSchemeValue getDatasourcetype() {
return datasourcetype;
}
public void setDatasourcetype(DatasourceSchemeValue datasourcetype) {
this.datasourcetype = datasourcetype;
}
public String getOpenairecompatibility() {
return openairecompatibility;
}
public void setOpenairecompatibility(String openairecompatibility) {
this.openairecompatibility = openairecompatibility;
}
public String getOfficialname() {
return officialname;
}
public void setOfficialname(String officialname) {
this.officialname = officialname;
}
public String getEnglishname() {
return englishname;
}
public void setEnglishname(String englishname) {
this.englishname = englishname;
}
public String getWebsiteurl() {
return websiteurl;
}
public void setWebsiteurl(String websiteurl) {
this.websiteurl = websiteurl;
}
public String getLogourl() {
return logourl;
}
public void setLogourl(String logourl) {
this.logourl = logourl;
}
public String getDateofvalidation() {
return dateofvalidation;
}
public void setDateofvalidation(String dateofvalidation) {
this.dateofvalidation = dateofvalidation;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public List<String> getSubjects() {
return subjects;
}
public void setSubjects(List<String> subjects) {
this.subjects = subjects;
}
public List<String> getLanguages() {
return languages;
}
public void setLanguages(List<String> languages) {
this.languages = languages;
}
public List<String> getContenttypes() {
return contenttypes;
}
public void setContenttypes(List<String> contenttypes) {
this.contenttypes = contenttypes;
}
public String getReleasestartdate() {
return releasestartdate;
}
public void setReleasestartdate(String releasestartdate) {
this.releasestartdate = releasestartdate;
}
public String getReleaseenddate() {
return releaseenddate;
}
public void setReleaseenddate(String releaseenddate) {
this.releaseenddate = releaseenddate;
}
public String getMissionstatementurl() {
return missionstatementurl;
}
public void setMissionstatementurl(String missionstatementurl) {
this.missionstatementurl = missionstatementurl;
}
public String getAccessrights() {
return accessrights;
}
public void setAccessrights(String accessrights) {
this.accessrights = accessrights;
}
public String getUploadrights() {
return uploadrights;
}
public void setUploadrights(String uploadrights) {
this.uploadrights = uploadrights;
}
public String getDatabaseaccessrestriction() {
return databaseaccessrestriction;
}
public void setDatabaseaccessrestriction(String databaseaccessrestriction) {
this.databaseaccessrestriction = databaseaccessrestriction;
}
public String getDatauploadrestriction() {
return datauploadrestriction;
}
public void setDatauploadrestriction(String datauploadrestriction) {
this.datauploadrestriction = datauploadrestriction;
}
public Boolean getVersioning() {
return versioning;
}
public void setVersioning(Boolean versioning) {
this.versioning = versioning;
}
public String getCitationguidelineurl() {
return citationguidelineurl;
}
public void setCitationguidelineurl(String citationguidelineurl) {
this.citationguidelineurl = citationguidelineurl;
}
public String getPidsystems() {
return pidsystems;
}
public void setPidsystems(String pidsystems) {
this.pidsystems = pidsystems;
}
public String getCertificates() {
return certificates;
}
public void setCertificates(String certificates) {
this.certificates = certificates;
}
public List<String> getPolicies() {
return policies;
}
public void setPolicies(List<String> policiesr3) {
this.policies = policiesr3;
}
public Container getJournal() {
return journal;
}
public void setJournal(Container journal) {
this.journal = journal;
}
}

View File

@ -1,41 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public class DatasourcePid implements Serializable {
@JsonSchema(description = "The scheme used to express the value ")
private String scheme;
@JsonSchema(description = "The value expressed in the scheme ")
private String value;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static DatasourcePid newInstance(String scheme, String value) {
DatasourcePid cf = new DatasourcePid();
cf.setScheme(scheme);
cf.setValue(value);
return cf;
}
}

View File

@ -1,41 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public // TODO change the DatasourceSchemaValue to DatasourceKeyValue. The scheme is always the dnet one. What we show
// here is the entry in the scheme (the key) and its understandable value
class DatasourceSchemeValue implements Serializable {
@JsonSchema(description = "The scheme used to express the value (i.e. pubsrepository::journal)")
private String scheme;
@JsonSchema(description = "The value expressed in the scheme (Journal)")
private String value;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static DatasourceSchemeValue newInstance(String scheme, String value) {
DatasourceSchemeValue cf = new DatasourceSchemeValue();
cf.setScheme(scheme);
cf.setValue(value);
return cf;
}
}

View File

@ -1,28 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
import eu.dnetlib.dhp.oa.model.Instance;
import eu.dnetlib.dhp.oa.model.Result;
/**
* It extends the eu.dnetlib.dhp.schema.dump.oaf.Result with - instance of type
* List<eu.dnetlib.dhp.schema.dump.oaf.Instance> to store all the instances associated to the result. It corresponds to
* the same parameter in the result represented in the internal model
*/
public class GraphResult extends Result {
@JsonSchema(
description = "Each instance is one specific materialisation or version of the result. For example, you can have one result with three instance: one is the pre-print, one is the post-print, one is te published version")
private List<Instance> instance;
public List<Instance> getInstance() {
return instance;
}
public void setInstance(List<Instance> instance) {
this.instance = instance;
}
}

View File

@ -1,82 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
/**
* To store information about the classification for the project. The classification depends on the programme. For example
* H2020-EU.3.4.5.3 can be classified as
* H2020-EU.3. => Societal Challenges (level1)
* H2020-EU.3.4. => Transport (level2)
* H2020-EU.3.4.5. => CLEANSKY2 (level3)
* H2020-EU.3.4.5.3. => IADP Fast Rotorcraft (level4)
*
* We decided to explicitly represent up to three levels in the classification.
*
* H2020Classification has the following parameters:
* - private Programme programme to store the information about the programme related to this classification
* - private String level1 to store the information about the level 1 of the classification (Priority or Pillar of the EC)
* - private String level2 to store the information about the level2 af the classification (Objectives (?))
* - private String level3 to store the information about the level3 of the classification
* - private String classification to store the entire classification related to the programme
*/
public class H2020Classification implements Serializable {
private Programme programme;
private String level1;
private String level2;
private String level3;
private String classification;
public Programme getProgramme() {
return programme;
}
public void setProgramme(Programme programme) {
this.programme = programme;
}
public String getLevel1() {
return level1;
}
public void setLevel1(String level1) {
this.level1 = level1;
}
public String getLevel2() {
return level2;
}
public void setLevel2(String level2) {
this.level2 = level2;
}
public String getLevel3() {
return level3;
}
public void setLevel3(String level3) {
this.level3 = level3;
}
public String getClassification() {
return classification;
}
public void setClassification(String classification) {
this.classification = classification;
}
public static H2020Classification newInstance(String programme_code, String programme_description, String level1,
String level2, String level3, String classification) {
H2020Classification h2020classification = new H2020Classification();
h2020classification.programme = Programme.newInstance(programme_code, programme_description);
h2020classification.level1 = level1;
h2020classification.level2 = level2;
h2020classification.level3 = level3;
h2020classification.classification = classification;
return h2020classification;
}
}

View File

@ -1,38 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
/**
* To represent the generic node in a relation. It has the following parameters: - private String id the openaire id of
* the entity in the relation - private String type the type of the entity in the relation. Consider the generic
* relation between a Result R and a Project P, the node representing R will have as id the id of R and as type result,
* while the node representing the project will have as id the id of the project and as type project
*/
public class Node implements Serializable {
private String id;
private String type;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public static Node newInstance(String id, String type) {
Node node = new Node();
node.id = id;
node.type = type;
return node;
}
}

View File

@ -1,42 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
public
class OrganizationPid implements Serializable {
@JsonSchema(description = "The scheme of the identifier (i.e. isni)")
private String scheme;
@JsonSchema(description = "The value in the schema (i.e. 0000000090326370)")
private String value;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static OrganizationPid newInstance(String scheme, String value) {
OrganizationPid cf = new OrganizationPid();
cf.setScheme(scheme);
cf.setValue(value);
return cf;
}
}

View File

@ -1,24 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.util.List;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* To represent RC entities. It extends eu.dnetlib.dhp.dump.oaf.grap.ResearchInitiative by adding the parameter subject
* to store the list of subjects related to the community
*/
public class ResearchCommunity extends ResearchInitiative {
@JsonSchema(
description = "Only for research communities: the list of the subjects associated to the research community")
private List<String> subject;
public List<String> getSubject() {
return subject;
}
public void setSubject(List<String> subject) {
this.subject = subject;
}
}

View File

@ -1,89 +0,0 @@
package eu.dnetlib.dhp.oa.model.graph;
import java.io.Serializable;
import com.github.imifou.jsonschema.module.addon.annotation.JsonSchema;
/**
* To represent entity of type RC/RI. It has the following parameters, which are mostly derived by the profile
* - private
* String id to store the openaire id for the entity. Is has as code 00 and will be created as
* 00|context_____::md5(originalId) private
* String originalId to store the id of the context as provided in the profile
* (i.e. mes)
* - private String name to store the name of the context (got from the label attribute in the context
* definition)
* - private String type to store the type of the context (i.e.: research initiative or research community)
* - private String description to store the description of the context as given in the profile
* -private String
* zenodo_community to store the zenodo community associated to the context (main zenodo community)
*/
public class ResearchInitiative implements Serializable {
@JsonSchema(description = "The OpenAIRE id for the community/research infrastructure")
private String id; // openaireId
@JsonSchema(description = "The acronym of the community")
private String acronym; // context id
@JsonSchema(description = "The long name of the community")
private String name; // context name
@JsonSchema(description = "One of {Research Community, Research infrastructure}")
private String type; // context type: research initiative or research community
@JsonSchema(description = "Description of the research community/research infrastructure")
private String description;
@JsonSchema(
description = "The URL of the Zenodo community associated to the Research community/Research infrastructure")
private String zenodo_community;
public String getZenodo_community() {
return zenodo_community;
}
public void setZenodo_community(String zenodo_community) {
this.zenodo_community = zenodo_community;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String label) {
this.name = label;
}
public String getAcronym() {
return acronym;
}
public void setAcronym(String acronym) {
this.acronym = acronym;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
}

View File

@ -0,0 +1,602 @@
{
"$schema" : "http://json-schema.org/draft-07/schema#",
"definitions" : {
"CfHbKeyValue" : {
"type" : "object",
"properties" : {
"key" : {
"type" : "string",
"description" : "the OpenAIRE identifier of the data source"
},
"value" : {
"type" : "string",
"description" : "the name of the data source"
}
}
},
"Provenance" : {
"type" : "object",
"properties" : {
"provenance" : {
"type" : "string"
},
"trust" : {
"type" : "string"
}
}
},
"ResultPid" : {
"type" : "object",
"properties" : {
"scheme" : {
"type" : "string",
"description" : "The scheme of the persistent identifier for the result (i.e. doi). If the pid is here it means the information for the pid has been collected from an authority for that pid type (i.e. Crossref/Datacite for doi). The set of authoritative pid is: doi when collected from Crossref or Datacite pmid when collected from EuroPubmed, arxiv when collected from arXiv, handle from the repositories"
},
"value" : {
"type" : "string",
"description" : "The value expressed in the scheme (i.e. 10.1000/182)"
}
}
}
},
"type" : "object",
"properties" : {
"author" : {
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"fullname" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"pid" : {
"type" : "object",
"properties" : {
"id" : {
"type" : "object",
"properties" : {
"scheme" : {
"type" : "string",
"description" : "The author's pid scheme. OpenAIRE currently supports 'ORCID'"
},
"value" : {
"type" : "string",
"description" : "The author's pid value in that scheme (i.e. 0000-1111-2222-3333)"
}
}
},
"provenance" : {
"allOf" : [ {
"$ref" : "#/definitions/Provenance"
}, {
"description" : "The reason why the pid was associated to the author"
} ]
}
},
"description" : "The author's persistent identifiers"
},
"rank" : {
"type" : "integer"
},
"surname" : {
"type" : "string"
}
}
}
},
"bestaccessright" : {
"type" : "object",
"properties" : {
"code" : {
"type" : "string",
"description" : "COAR access mode code: http://vocabularies.coar-repositories.org/documentation/access_rights/"
},
"label" : {
"type" : "string",
"description" : "Label for the access mode"
},
"scheme" : {
"type" : "string",
"description" : "Scheme of reference for access right code. Always set to COAR access rights vocabulary: http://vocabularies.coar-repositories.org/documentation/access_rights/"
}
},
"description" : "The openest of the access rights of this result."
},
"codeRepositoryUrl" : {
"type" : "string",
"description" : "Only for results with type 'software': the URL to the repository with the source code"
},
"collectedfrom" : {
"description" : "Information about the sources from which the record has been collected",
"type" : "array",
"items" : {
"allOf" : [ {
"$ref" : "#/definitions/CfHbKeyValue"
}, {
"description" : "Information about the sources from which the record has been collected"
} ]
}
},
"contactgroup" : {
"description" : "Only for results with type 'software': Information on the group responsible for providing further information regarding the resource",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Only for results with type 'software': Information on the group responsible for providing further information regarding the resource"
}
},
"contactperson" : {
"description" : "Only for results with type 'software': Information on the person responsible for providing further information regarding the resource",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Only for results with type 'software': Information on the person responsible for providing further information regarding the resource"
}
},
"container" : {
"type" : "object",
"properties" : {
"conferencedate" : {
"type" : "string"
},
"conferenceplace" : {
"type" : "string"
},
"edition" : {
"type" : "string",
"description" : "Edition of the journal or conference proceeding"
},
"ep" : {
"type" : "string",
"description" : "End page"
},
"iss" : {
"type" : "string",
"description" : "Journal issue number"
},
"issnLinking" : {
"type" : "string"
},
"issnOnline" : {
"type" : "string"
},
"issnPrinted" : {
"type" : "string"
},
"name" : {
"type" : "string",
"description" : "Name of the journal or conference"
},
"sp" : {
"type" : "string",
"description" : "Start page"
},
"vol" : {
"type" : "string",
"description" : "Volume"
}
},
"description" : "Container has information about the conference or journal where the result has been presented or published"
},
"context" : {
"description" : "Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"code" : {
"type" : "string",
"description" : "Code identifying the RI/RC"
},
"label" : {
"type" : "string",
"description" : "Label of the RI/RC"
},
"provenance" : {
"description" : "Why this result is associated to the RI/RC.",
"type" : "array",
"items" : {
"allOf" : [ {
"$ref" : "#/definitions/Provenance"
}, {
"description" : "Why this result is associated to the RI/RC."
} ]
}
}
},
"description" : "Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu"
}
},
"contributor" : {
"description" : "Contributors for the result",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Contributors for the result"
}
},
"country" : {
"description" : "The list of countries associated to this result",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"code" : {
"type" : "string",
"description" : "ISO 3166-1 alpha-2 country code (i.e. IT)"
},
"label" : {
"type" : "string",
"description" : "The label for that code (i.e. Italy)"
},
"provenance" : {
"allOf" : [ {
"$ref" : "#/definitions/Provenance"
}, {
"description" : "Why this result is associated to the country."
} ]
}
},
"description" : "The list of countries associated to this result"
}
},
"coverage" : {
"type" : "array",
"items" : {
"type" : "string"
}
},
"dateofcollection" : {
"type" : "string",
"description" : "When OpenAIRE collected the record the last time"
},
"description" : {
"type" : "array",
"items" : {
"type" : "string"
}
},
"documentationUrl" : {
"description" : "Only for results with type 'software': URL to the software documentation",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Only for results with type 'software': URL to the software documentation"
}
},
"embargoenddate" : {
"type" : "string",
"description" : "Date when the embargo ends and this result turns Open Access"
},
"format" : {
"type" : "array",
"items" : {
"type" : "string"
}
},
"geolocation" : {
"description" : "Geolocation information",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"box" : {
"type" : "string"
},
"place" : {
"type" : "string"
},
"point" : {
"type" : "string"
}
},
"description" : "Geolocation information"
}
},
"id" : {
"type" : "string",
"description" : "The OpenAIRE identifiers for this result"
},
"instance" : {
"description" : "Each instance is one specific materialisation or version of the result. For example, you can have one result with three instance: one is the pre-print, one is the post-print, one is te published version",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"accessright" : {
"type" : "object",
"properties" : {
"code" : {
"type" : "string",
"description" : "COAR access mode code: http://vocabularies.coar-repositories.org/documentation/access_rights/"
},
"label" : {
"type" : "string",
"description" : "Label for the access mode"
},
"openAccessRoute" : {
"type" : "string",
"enum" : [ "gold", "green", "hybrid", "bronze" ]
},
"scheme" : {
"type" : "string",
"description" : "Scheme of reference for access right code. Always set to COAR access rights vocabulary: http://vocabularies.coar-repositories.org/documentation/access_rights/"
}
},
"description" : "The accessRights for this materialization of the result"
},
"alternateIdentifier" : {
"description" : "All the identifiers other than pids forged by an authorithy for the pid type (i.e. Crossref for DOIs",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"scheme" : {
"type" : "string",
"description" : "The scheme of the identifier. It can be a persistent identifier (i.e. doi). If it is present in the alternate identifiers it means it has not been forged by an authority for that pid. For example we collect metadata from an institutional repository that provides as identifier for the result also the doi"
},
"value" : {
"type" : "string",
"description" : "The value expressed in the scheme"
}
},
"description" : "All the identifiers other than pids forged by an authorithy for the pid type (i.e. Crossref for DOIs"
}
},
"articleprocessingcharge" : {
"type" : "object",
"properties" : {
"amount" : {
"type" : "string"
},
"currency" : {
"type" : "string"
}
},
"description" : "The money spent to make this book or article available in Open Access. Source for this information is the OpenAPC initiative."
},
"collectedfrom" : {
"allOf" : [ {
"$ref" : "#/definitions/CfHbKeyValue"
}, {
"description" : "Information about the source from which the record has been collected"
} ]
},
"hostedby" : {
"allOf" : [ {
"$ref" : "#/definitions/CfHbKeyValue"
}, {
"description" : "Information about the source from which the instance can be viewed or downloaded."
} ]
},
"license" : {
"type" : "string"
},
"measures" : {
"description" : "Measures computed for this instance, for example Bip!Finder ones",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"key" : {
"type" : "string",
"description" : "The measure (i.e. popularity)"
},
"value" : {
"type" : "string",
"description" : "The value for that measure"
}
},
"description" : "Measures computed for this instance, for example Bip!Finder ones"
}
},
"pid" : {
"type" : "array",
"items" : {
"$ref" : "#/definitions/ResultPid"
}
},
"publicationdate" : {
"type" : "string",
"description" : "Date of the research product"
},
"refereed" : {
"type" : "string",
"description" : "If this instance has been peer-reviewed or not. Allowed values are peerReviewed, nonPeerReviewed, UNKNOWN (as defined in https://api.openaire.eu/vocabularies/dnet:review_levels)"
},
"type" : {
"type" : "string",
"description" : "The specific sub-type of this instance (see https://api.openaire.eu/vocabularies/dnet:result_typologies following the links)"
},
"url" : {
"description" : "URLs to the instance. They may link to the actual full-text or to the landing page at the hosting source. ",
"type" : "array",
"items" : {
"type" : "string",
"description" : "URLs to the instance. They may link to the actual full-text or to the landing page at the hosting source. "
}
}
},
"description" : "Each instance is one specific materialisation or version of the result. For example, you can have one result with three instance: one is the pre-print, one is the post-print, one is te published version"
}
},
"language" : {
"type" : "object",
"properties" : {
"code" : {
"type" : "string",
"description" : "alpha-3/ISO 639-2 code of the language"
},
"label" : {
"type" : "string",
"description" : "Language label in English"
}
}
},
"lastupdatetimestamp" : {
"type" : "integer",
"description" : "Timestamp of last update of the record in OpenAIRE"
},
"maintitle" : {
"type" : "string",
"description" : "A name or title by which a scientific result is known. May be the title of a publication, of a dataset or the name of a piece of software."
},
"originalId" : {
"description" : "Identifiers of the record at the original sources",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Identifiers of the record at the original sources"
}
},
"pid" : {
"description" : "Persistent identifiers of the result",
"type" : "array",
"items" : {
"allOf" : [ {
"$ref" : "#/definitions/ResultPid"
}, {
"description" : "Persistent identifiers of the result"
} ]
}
},
"programmingLanguage" : {
"type" : "string",
"description" : "Only for results with type 'software': the programming language"
},
"projects" : {
"description" : "List of projects (i.e. grants) that (co-)funded the production ofn the research results",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"acronym" : {
"type" : "string",
"description" : "The acronym of the project"
},
"code" : {
"type" : "string",
"description" : "The grant agreement number"
},
"funder" : {
"type" : "object",
"properties" : {
"fundingStream" : {
"type" : "string",
"description" : "Stream of funding (e.g. for European Commission can be H2020 or FP7)"
},
"jurisdiction" : {
"type" : "string",
"description" : "Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR)"
},
"name" : {
"type" : "string",
"description" : "The name of the funder (European Commission)"
},
"shortName" : {
"type" : "string",
"description" : "The short name of the funder (EC)"
}
},
"description" : "Information about the funder funding the project"
},
"id" : {
"type" : "string",
"description" : "The OpenAIRE id for the project"
},
"provenance" : {
"$ref" : "#/definitions/Provenance"
},
"title" : {
"type" : "string"
},
"validated" : {
"type" : "object",
"properties" : {
"validatedByFunder" : {
"type" : "boolean"
},
"validationDate" : {
"type" : "string"
}
}
}
},
"description" : "List of projects (i.e. grants) that (co-)funded the production ofn the research results"
}
},
"publicationdate" : {
"type" : "string",
"description" : "Main date of the research product: typically the publication or issued date. In case of a research result with different versions with different dates, the date of the result is selected as the most frequent well-formatted date. If not available, then the most recent and complete date among those that are well-formatted. For statistics, the year is extracted and the result is counted only among the result of that year. Example: Pre-print date: 2019-02-03, Article date provided by repository: 2020-02, Article date provided by Crossref: 2020, OpenAIRE will set as date 2019-02-03, because its the most recent among the complete and well-formed dates. If then the repository updates the metadata and set a complete date (e.g. 2020-02-12), then this will be the new date for the result because it becomes the most recent most complete date. However, if OpenAIRE then collects the pre-print from another repository with date 2019-02-03, then this will be the “winning date” because it becomes the most frequent well-formatted date."
},
"publisher" : {
"type" : "string",
"description" : "The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource."
},
"size" : {
"type" : "string",
"description" : "Only for results with type 'dataset': the declared size of the dataset"
},
"source" : {
"description" : "See definition of Dublin Core field dc:source",
"type" : "array",
"items" : {
"type" : "string",
"description" : "See definition of Dublin Core field dc:source"
}
},
"subjects" : {
"description" : "Keywords associated to the result",
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"provenance" : {
"allOf" : [ {
"$ref" : "#/definitions/Provenance"
}, {
"description" : "Why this subject is associated to the result"
} ]
},
"subject" : {
"type" : "object",
"properties" : {
"scheme" : {
"type" : "string",
"description" : "OpenAIRE subject classification scheme (https://api.openaire.eu/vocabularies/dnet:subject_classification_typologies)."
},
"value" : {
"type" : "string",
"description" : "The value for the subject in the selected scheme. When the scheme is 'keyword', it means that the subject is free-text (i.e. not a term from a controlled vocabulary)."
}
}
}
},
"description" : "Keywords associated to the result"
}
},
"subtitle" : {
"type" : "string",
"description" : "Explanatory or alternative name by which a scientific result is known."
},
"tool" : {
"description" : "Only for results with type 'other': tool useful for the interpretation and/or re-used of the research product",
"type" : "array",
"items" : {
"type" : "string",
"description" : "Only for results with type 'other': tool useful for the interpretation and/or re-used of the research product"
}
},
"type" : {
"type" : "string",
"description" : "Type of the result: one of 'publication', 'dataset', 'software', 'other' (see also https://api.openaire.eu/vocabularies/dnet:result_typologies)"
},
"version" : {
"type" : "string",
"description" : "Version of the result"
}
}
}

View File

@ -9,28 +9,14 @@ import com.github.imifou.jsonschema.module.addon.AddonModule;
import com.github.victools.jsonschema.generator.*;
import eu.dnetlib.dhp.ExecCreateSchemas;
import eu.dnetlib.dhp.oa.model.graph.GraphResult;
import eu.dnetlib.dhp.eosc.model.Relation;
import eu.dnetlib.dhp.eosc.model.Result;
//@Disabled
class GenerateJsonSchema {
@Test
void generateSchema() {
SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(SchemaVersion.DRAFT_7,
OptionPreset.PLAIN_JSON)
.with(Option.SCHEMA_VERSION_INDICATOR)
.without(Option.NONPUBLIC_NONSTATIC_FIELDS_WITHOUT_GETTERS);
configBuilder.forFields().withDescriptionResolver(field -> "Description of " + field.getDeclaredName());
SchemaGeneratorConfig config = configBuilder.build();
SchemaGenerator generator = new SchemaGenerator(config);
JsonNode jsonSchema = generator.generateSchema(GraphResult.class);
System.out.println(jsonSchema.toString());
}
@Test
void generateSchema2() {
void generateSchema3() throws JsonProcessingException {
ObjectMapper objectMapper = new ObjectMapper();
AddonModule module = new AddonModule();
@ -41,11 +27,27 @@ class GenerateJsonSchema {
.without(Option.NONPUBLIC_NONSTATIC_FIELDS_WITHOUT_GETTERS);
SchemaGeneratorConfig config = configBuilder.build();
SchemaGenerator generator = new SchemaGenerator(config);
JsonNode jsonSchema = generator.generateSchema(GraphResult.class);
JsonNode jsonSchema = generator.generateSchema(Result.class);
System.out.println(jsonSchema.toString());
System.out.println(new ObjectMapper().writeValueAsString(jsonSchema));
}
@Test
void generateSchemaEoscRelation() throws JsonProcessingException {
ObjectMapper objectMapper = new ObjectMapper();
AddonModule module = new AddonModule();
SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(objectMapper,
SchemaVersion.DRAFT_7, OptionPreset.PLAIN_JSON)
.with(module)
.with(Option.SCHEMA_VERSION_INDICATOR)
.without(Option.NONPUBLIC_NONSTATIC_FIELDS_WITHOUT_GETTERS);
SchemaGeneratorConfig config = configBuilder.build();
SchemaGenerator generator = new SchemaGenerator(config);
JsonNode jsonSchema = generator.generateSchema(Result.class);
System.out.println(new ObjectMapper().writeValueAsString(jsonSchema));
}
@Test
void generateJsonSchema3() throws IOException {

View File

@ -6,6 +6,7 @@
<artifactId>dhp-graph-dump</artifactId>
<groupId>eu.dnetlib.dhp</groupId>
<version>1.2.5-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>
@ -53,7 +54,17 @@
<artifactId>dump-schema</artifactId>
<version>1.2.5-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>api</artifactId>
<version>1.2.5-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>api</artifactId>
<version>1.2.5-SNAPSHOT</version>
<scope>compile</scope>
</dependency>
</dependencies>

View File

@ -0,0 +1,190 @@
package eu.dnetlib.dhp.oa.common;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
public class MakeTarArchive implements Serializable {
private static final Logger log = LoggerFactory.getLogger(MakeTarArchive.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
MakeTarArchive.class
.getResourceAsStream(
"/eu/dnetlib/dhp/eosc_input_maketar_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
final String outputPath = parser.get("hdfsPath");
log.info("hdfsPath: {}", outputPath);
final String hdfsNameNode = parser.get("nameNode");
log.info("nameNode: {}", hdfsNameNode);
final String inputPath = parser.get("sourcePath");
log.info("input path : {}", inputPath);
final int gBperSplit = Optional
.ofNullable(parser.get("splitSize"))
.map(Integer::valueOf)
.orElse(10);
final boolean rename = Optional
.ofNullable(parser.get("rename"))
.map(Boolean::valueOf)
.orElse(Boolean.FALSE);
Configuration conf = new Configuration();
conf.set("fs.defaultFS", hdfsNameNode);
FileSystem fileSystem = FileSystem.get(conf);
makeTArArchive(fileSystem, inputPath, outputPath, gBperSplit, rename);
}
public static void makeTArArchive(FileSystem fileSystem, String inputPath, String outputPath, int gBperSplit)
throws IOException {
makeTArArchive(fileSystem, inputPath, outputPath, gBperSplit, false);
}
public static void makeTArArchive(FileSystem fileSystem, String inputPath, String outputPath, int gBperSplit,
boolean rename)
throws IOException {
RemoteIterator<LocatedFileStatus> dirIterator = fileSystem.listLocatedStatus(new Path(inputPath));
while (dirIterator.hasNext()) {
LocatedFileStatus fileStatus = dirIterator.next();
Path p = fileStatus.getPath();
String pathString = p.toString();
String entity = pathString.substring(pathString.lastIndexOf("/") + 1);
MakeTarArchive.tarMaxSize(fileSystem, pathString, outputPath + "/" + entity, entity, gBperSplit, rename);
}
}
private static TarArchiveOutputStream getTar(FileSystem fileSystem, String outputPath) throws IOException {
Path hdfsWritePath = new Path(outputPath);
if (fileSystem.exists(hdfsWritePath)) {
fileSystem.delete(hdfsWritePath, true);
}
return new TarArchiveOutputStream(fileSystem.create(hdfsWritePath).getWrappedStream());
}
private static void write(FileSystem fileSystem, String inputPath, String outputPath, String dirName,
boolean rename)
throws IOException {
Path hdfsWritePath = new Path(outputPath);
if (fileSystem.exists(hdfsWritePath)) {
fileSystem.delete(hdfsWritePath, true);
}
try (TarArchiveOutputStream ar = new TarArchiveOutputStream(
fileSystem.create(hdfsWritePath).getWrappedStream())) {
RemoteIterator<LocatedFileStatus> iterator = fileSystem
.listFiles(
new Path(inputPath), true);
while (iterator.hasNext()) {
writeCurrentFile(fileSystem, dirName, iterator, ar, 0, rename);
}
}
}
public static void tarMaxSize(FileSystem fileSystem, String inputPath, String outputPath, String dir_name,
int gBperSplit, boolean rename) throws IOException {
final long bytesPerSplit = 1024L * 1024L * 1024L * gBperSplit;
long sourceSize = fileSystem.getContentSummary(new Path(inputPath)).getSpaceConsumed();
if (sourceSize < bytesPerSplit) {
write(fileSystem, inputPath, outputPath + ".tar", dir_name, rename);
} else {
int partNum = 0;
RemoteIterator<LocatedFileStatus> fileStatusListIterator = fileSystem
.listFiles(
new Path(inputPath), true);
boolean next = fileStatusListIterator.hasNext();
while (next) {
try (TarArchiveOutputStream ar = getTar(fileSystem, outputPath + "_" + (partNum + 1) + ".tar")) {
long currentSize = 0;
while (next && currentSize < bytesPerSplit) {
currentSize = writeCurrentFile(
fileSystem, dir_name, fileStatusListIterator, ar, currentSize, rename);
next = fileStatusListIterator.hasNext();
}
partNum += 1;
}
}
}
}
private static long writeCurrentFile(FileSystem fileSystem, String dirName,
RemoteIterator<LocatedFileStatus> fileStatusListIterator,
TarArchiveOutputStream ar, long currentSize, boolean rename) throws IOException {
LocatedFileStatus fileStatus = fileStatusListIterator.next();
Path p = fileStatus.getPath();
String pString = p.toString();
if (!pString.endsWith("_SUCCESS")) {
String name = pString.substring(pString.lastIndexOf("/") + 1);
if (name.startsWith("part-") & name.length() > 10) {
String tmp = name.substring(0, 10);
if (name.contains(".")) {
tmp += name.substring(name.indexOf("."));
}
name = tmp;
}
if (rename) {
if (name.endsWith(".txt.gz"))
name = name.replace(".txt.gz", ".json.gz");
}
TarArchiveEntry entry = new TarArchiveEntry(dirName + "/" + name);
entry.setSize(fileStatus.getLen());
currentSize += fileStatus.getLen();
ar.putArchiveEntry(entry);
InputStream is = fileSystem.open(fileStatus.getPath());
BufferedInputStream bis = new BufferedInputStream(is);
int count;
byte[] data = new byte[1024];
while ((count = bis.read(data, 0, data.length)) != -1) {
ar.write(data, 0, count);
}
bis.close();
ar.closeArchiveEntry();
}
return currentSize;
}
}

View File

@ -43,7 +43,7 @@ public class Constants {
}
public enum DUMPTYPE {
COMPLETE("complete"), COMMUNITY("community"), FUNDER("funder");
COMPLETE("complete"), COMMUNITY("community"), FUNDER("funder"), EOSC("eosc");
private final String type;

View File

@ -4,11 +4,7 @@ package eu.dnetlib.dhp.oa.graph.dump;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
@ -17,11 +13,11 @@ import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import eu.dnetlib.dhp.oa.graph.dump.community.CommunityMap;
import eu.dnetlib.dhp.eosc.model.Result;
import eu.dnetlib.dhp.oa.graph.dump.eosc.CommunityMap;
import eu.dnetlib.dhp.oa.graph.dump.eosc.Utils;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.CardinalityTooHighException;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.NoAvailableEntityTypeException;
import eu.dnetlib.dhp.oa.model.Result;
import eu.dnetlib.dhp.schema.oaf.Context;
import eu.dnetlib.dhp.schema.oaf.DataInfo;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
@ -32,9 +28,7 @@ import eu.dnetlib.dhp.schema.oaf.OafEntity;
public class DumpProducts implements Serializable {
public void run(Boolean isSparkSessionManaged, String inputPath, String outputPath, String communityMapPath,
Class<? extends OafEntity> inputClazz,
Class<? extends Result> outputClazz,
String dumpType) {
Class<? extends OafEntity> inputClazz) {
SparkConf conf = new SparkConf();
@ -44,25 +38,23 @@ public class DumpProducts implements Serializable {
spark -> {
Utils.removeOutputDir(spark, outputPath);
execDump(
spark, inputPath, outputPath, communityMapPath, inputClazz, outputClazz, dumpType);
spark, inputPath, outputPath, communityMapPath, inputClazz);
});
}
public static <I extends OafEntity, O extends Result> void execDump(
public static <I extends OafEntity> void execDump(
SparkSession spark,
String inputPath,
String outputPath,
String communityMapPath,
Class<I> inputClazz,
Class<O> outputClazz,
String dumpType) {
Class<I> inputClazz) {
CommunityMap communityMap = Utils.getCommunityMap(spark, communityMapPath);
Utils
.readPath(spark, inputPath, inputClazz)
.map((MapFunction<I, O>) value -> execMap(value, communityMap, dumpType), Encoders.bean(outputClazz))
.filter((FilterFunction<O>) value -> value != null)
.map((MapFunction<I, Result>) value -> execMap(value, communityMap), Encoders.bean(Result.class))
.filter((FilterFunction<Result>) value -> value != null)
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
@ -70,9 +62,8 @@ public class DumpProducts implements Serializable {
}
private static <I extends OafEntity, O extends Result> O execMap(I value,
CommunityMap communityMap,
String dumpType) throws NoAvailableEntityTypeException, CardinalityTooHighException {
private static <I extends OafEntity> Result execMap(I value,
CommunityMap communityMap) throws NoAvailableEntityTypeException, CardinalityTooHighException {
Optional<DataInfo> odInfo = Optional.ofNullable(value.getDataInfo());
if (odInfo.isPresent()) {
@ -83,29 +74,7 @@ public class DumpProducts implements Serializable {
return null;
}
if (Constants.DUMPTYPE.COMMUNITY.getType().equals(dumpType)) {
Set<String> communities = communityMap.keySet();
Optional<List<Context>> inputContext = Optional
.ofNullable(((eu.dnetlib.dhp.schema.oaf.Result) value).getContext());
if (!inputContext.isPresent()) {
return null;
}
List<String> toDumpFor = inputContext.get().stream().map(c -> {
if (communities.contains(c.getId())) {
return c.getId();
}
if (c.getId().contains("::") && communities.contains(c.getId().substring(0, c.getId().indexOf("::")))) {
return c.getId().substring(0, c.getId().indexOf("::"));
}
return null;
}).filter(Objects::nonNull).collect(Collectors.toList());
if (toDumpFor.isEmpty()) {
return null;
}
}
return (O) ResultMapper.map(value, communityMap, dumpType);
return ResultMapper.map(value, communityMap, null);
}
}

View File

@ -15,7 +15,7 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.common.MakeTarArchive;
import eu.dnetlib.dhp.oa.common.MakeTarArchive;
public class MakeTar implements Serializable {
@ -26,7 +26,7 @@ public class MakeTar implements Serializable {
.toString(
MakeTar.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_maketar_parameters.json"));
"/eu/dnetlib/dhp/oa/graph/dump/eosc_input_maketar_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
@ -66,7 +66,7 @@ public class MakeTar implements Serializable {
String pathString = p.toString();
String entity = pathString.substring(pathString.lastIndexOf("/") + 1);
MakeTarArchive.tarMaxSize(fileSystem, pathString, outputPath + "/" + entity, entity, gBperSplit);
MakeTarArchive.tarMaxSize(fileSystem, pathString, outputPath + "/" + entity, entity, gBperSplit, true);
}
}

View File

@ -6,24 +6,26 @@ import java.util.*;
import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Column;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.eosc.model.*;
import eu.dnetlib.dhp.eosc.model.AccessRight;
import eu.dnetlib.dhp.eosc.model.Author;
import eu.dnetlib.dhp.eosc.model.Context;
import eu.dnetlib.dhp.eosc.model.GeoLocation;
import eu.dnetlib.dhp.eosc.model.Measure;
import eu.dnetlib.dhp.eosc.model.OpenAccessRoute;
import eu.dnetlib.dhp.eosc.model.Provenance;
import eu.dnetlib.dhp.eosc.model.Result;
import eu.dnetlib.dhp.oa.graph.dump.eosc.MasterDuplicate;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.CardinalityTooHighException;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.NoAvailableEntityTypeException;
import eu.dnetlib.dhp.oa.model.*;
import eu.dnetlib.dhp.oa.model.AccessRight;
import eu.dnetlib.dhp.oa.model.Author;
import eu.dnetlib.dhp.oa.model.GeoLocation;
import eu.dnetlib.dhp.oa.model.Instance;
import eu.dnetlib.dhp.oa.model.Measure;
import eu.dnetlib.dhp.oa.model.OpenAccessRoute;
import eu.dnetlib.dhp.oa.model.Result;
import eu.dnetlib.dhp.oa.model.community.CfHbKeyValue;
import eu.dnetlib.dhp.oa.model.community.CommunityInstance;
import eu.dnetlib.dhp.oa.model.community.CommunityResult;
import eu.dnetlib.dhp.oa.model.community.Context;
import eu.dnetlib.dhp.oa.model.graph.GraphResult;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.oaf.*;
@ -31,277 +33,45 @@ public class ResultMapper implements Serializable {
private static final Logger log = LoggerFactory.getLogger(ResultMapper.class);
public static <E extends eu.dnetlib.dhp.schema.oaf.OafEntity> Result map(
E in, Map<String, String> communityMap, String dumpType)
E in, Map<String, String> communityMap,
List<MasterDuplicate> eoscIds)
throws NoAvailableEntityTypeException, CardinalityTooHighException {
Result out;
if (Constants.DUMPTYPE.COMPLETE.getType().equals(dumpType)) {
out = new GraphResult();
} else {
out = new CommunityResult();
}
log.info("*****************" + eoscIds.size());
Result out = new Result();
eu.dnetlib.dhp.schema.oaf.Result input = (eu.dnetlib.dhp.schema.oaf.Result) in;
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> ort = Optional.ofNullable(input.getResulttype());
if (ort.isPresent()) {
try {
addTypeSpecificInformation(out, input, ort);
Optional
.ofNullable(input.getAuthor())
.ifPresent(
ats -> out.setAuthor(ats.stream().map(ResultMapper::getAuthor).collect(Collectors.toList())));
// I do not map Access Right UNKNOWN or OTHER
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> oar = Optional.ofNullable(input.getBestaccessright());
if (oar.isPresent() && Constants.ACCESS_RIGHTS_COAR_MAP.containsKey(oar.get().getClassid())) {
String code = Constants.ACCESS_RIGHTS_COAR_MAP.get(oar.get().getClassid());
out
.setBestaccessright(
BestAccessRight
.newInstance(
code,
Constants.COAR_CODE_LABEL_MAP.get(code),
Constants.COAR_ACCESS_RIGHT_SCHEMA));
}
final List<String> contributorList = new ArrayList<>();
Optional
.ofNullable(input.getContributor())
.ifPresent(value -> value.stream().forEach(c -> contributorList.add(c.getValue())));
out.setContributor(contributorList);
Optional
.ofNullable(input.getCountry())
.ifPresent(
value -> out
.setCountry(
value
.stream()
.map(
c -> {
if (c.getClassid().equals((ModelConstants.UNKNOWN))) {
return null;
}
ResultCountry country = new ResultCountry();
country.setCode(c.getClassid());
country.setLabel(c.getClassname());
Optional
.ofNullable(c.getDataInfo())
.ifPresent(
provenance -> country
.setProvenance(
Provenance
.newInstance(
provenance
.getProvenanceaction()
.getClassname(),
c.getDataInfo().getTrust())));
return country;
})
.filter(Objects::nonNull)
.collect(Collectors.toList())));
final List<String> coverageList = new ArrayList<>();
Optional
.ofNullable(input.getCoverage())
.ifPresent(value -> value.stream().forEach(c -> coverageList.add(c.getValue())));
out.setCoverage(coverageList);
addTypeSpecificInformation(out, input, ort.get());
mapAuthor(out, input);
mapAccessRight(out, input);
mapContributor(out, input);
mapCountry(out, input);
mapCoverage(out, input);
out.setDateofcollection(input.getDateofcollection());
final List<String> descriptionList = new ArrayList<>();
Optional
.ofNullable(input.getDescription())
.ifPresent(value -> value.forEach(d -> descriptionList.add(d.getValue())));
out.setDescription(descriptionList);
Optional<Field<String>> oStr = Optional.ofNullable(input.getEmbargoenddate());
if (oStr.isPresent()) {
out.setEmbargoenddate(oStr.get().getValue());
}
final List<String> formatList = new ArrayList<>();
Optional
.ofNullable(input.getFormat())
.ifPresent(value -> value.stream().forEach(f -> formatList.add(f.getValue())));
out.setFormat(formatList);
mapDescription(out, input);
mapEmbrargo(out, input);
mapMeasure(out, input);
mapFormat(out, input);
out.setId(input.getId());
out.setOriginalId(new ArrayList<>());
Optional
.ofNullable(input.getOriginalId())
.ifPresent(
v -> out
.setOriginalId(
input
.getOriginalId()
.stream()
.filter(s -> !s.startsWith("50|"))
.collect(Collectors.toList())));
Optional<List<eu.dnetlib.dhp.schema.oaf.Instance>> oInst = Optional
.ofNullable(input.getInstance());
if (oInst.isPresent()) {
if (Constants.DUMPTYPE.COMPLETE.getType().equals(dumpType)) {
((GraphResult) out)
.setInstance(
oInst.get().stream().map(ResultMapper::getGraphInstance).collect(Collectors.toList()));
} else {
((CommunityResult) out)
.setInstance(
oInst
.get()
.stream()
.map(ResultMapper::getCommunityInstance)
.collect(Collectors.toList()));
}
}
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> oL = Optional.ofNullable(input.getLanguage());
if (oL.isPresent()) {
eu.dnetlib.dhp.schema.oaf.Qualifier language = oL.get();
out.setLanguage(Language.newInstance(language.getClassid(), language.getClassname()));
}
Optional<Long> oLong = Optional.ofNullable(input.getLastupdatetimestamp());
if (oLong.isPresent()) {
out.setLastupdatetimestamp(oLong.get());
}
Optional<List<StructuredProperty>> otitle = Optional.ofNullable(input.getTitle());
if (otitle.isPresent()) {
List<StructuredProperty> iTitle = otitle
.get()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("main title"))
.collect(Collectors.toList());
if (!iTitle.isEmpty()) {
out.setMaintitle(iTitle.get(0).getValue());
}
iTitle = otitle
.get()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("subtitle"))
.collect(Collectors.toList());
if (!iTitle.isEmpty()) {
out.setSubtitle(iTitle.get(0).getValue());
}
}
Optional
.ofNullable(input.getPid())
.ifPresent(
value -> out
.setPid(
value
.stream()
.map(
p -> ResultPid
.newInstance(p.getQualifier().getClassid(), p.getValue()))
.collect(Collectors.toList())));
oStr = Optional.ofNullable(input.getDateofacceptance());
if (oStr.isPresent()) {
out.setPublicationdate(oStr.get().getValue());
}
oStr = Optional.ofNullable(input.getPublisher());
if (oStr.isPresent()) {
out.setPublisher(oStr.get().getValue());
}
Optional
.ofNullable(input.getSource())
.ifPresent(
value -> out.setSource(value.stream().map(Field::getValue).collect(Collectors.toList())));
List<Subject> subjectList = new ArrayList<>();
Optional
.ofNullable(input.getSubject())
.ifPresent(
value -> value
.forEach(s -> subjectList.add(getSubject(s))));
out.setSubjects(subjectList);
mapOriginalId(out, input);
mapInstance(out, input, eoscIds);
mapLamguage(out, input);
mapLastUpdateTimestamp(out, input);
mapTitle(out, input);
mapPid(out, input);
mapAcceptanceDate(out, input);
mapPublisher(out, input);
mapSource(out, input);
mapSubject(out, input);
out.setType(input.getResulttype().getClassid());
mapContext(communityMap, out, input);
mapCollectedfrom(out, input);
if (!Constants.DUMPTYPE.COMPLETE.getType().equals(dumpType)) {
((CommunityResult) out)
.setCollectedfrom(
input
.getCollectedfrom()
.stream()
.map(cf -> CfHbKeyValue.newInstance(cf.getKey(), cf.getValue()))
.collect(Collectors.toList()));
Set<String> communities = communityMap.keySet();
List<Context> contextList = Optional
.ofNullable(
input
.getContext())
.map(
value -> value
.stream()
.map(c -> {
String communityId = c.getId();
if (communityId.contains("::")) {
communityId = communityId.substring(0, communityId.indexOf("::"));
}
if (communities.contains(communityId)) {
Context context = new Context();
context.setCode(communityId);
context.setLabel(communityMap.get(communityId));
Optional<List<DataInfo>> dataInfo = Optional.ofNullable(c.getDataInfo());
if (dataInfo.isPresent()) {
List<Provenance> provenance = new ArrayList<>();
provenance
.addAll(
dataInfo
.get()
.stream()
.map(
di -> Optional
.ofNullable(di.getProvenanceaction())
.map(
provenanceaction -> Provenance
.newInstance(
provenanceaction.getClassname(),
di.getTrust()))
.orElse(null))
.filter(Objects::nonNull)
.collect(Collectors.toSet()));
try {
context.setProvenance(getUniqueProvenance(provenance));
} catch (NoAvailableEntityTypeException e) {
e.printStackTrace();
}
}
return context;
}
return null;
})
.filter(Objects::nonNull)
.collect(Collectors.toList()))
.orElse(new ArrayList<>());
if (!contextList.isEmpty()) {
Set<Integer> hashValue = new HashSet<>();
List<Context> remainigContext = new ArrayList<>();
contextList.forEach(c -> {
if (!hashValue.contains(c.hashCode())) {
remainigContext.add(c);
hashValue.add(c.hashCode());
}
});
((CommunityResult) out).setContext(remainigContext);
}
}
} catch (ClassCastException cce) {
return out;
return null;
}
}
@ -309,9 +79,362 @@ public class ResultMapper implements Serializable {
}
private static void mapCollectedfrom(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
out
.setCollectedfrom(
input
.getCollectedfrom()
.stream()
.map(cf -> CfHbKeyValue.newInstance(cf.getKey(), cf.getValue()))
.collect(Collectors.toList()));
}
private static void mapFulltext(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getFulltext()).isPresent() && !input.getFulltext().isEmpty())
out.setFulltext(input.getFulltext().stream().map(ft -> ft.getValue()).collect(Collectors.toList()));
}
private static void mapContext(Map<String, String> communityMap, Result out,
eu.dnetlib.dhp.schema.oaf.Result input) {
Set<String> communities = communityMap.keySet();
List<Context> contextList = Optional
.ofNullable(
input
.getContext())
.map(
value -> value
.stream()
.map(c -> {
String communityId = c.getId();
if (communityId.contains("::")) {
communityId = communityId.substring(0, communityId.indexOf("::"));
}
if (communities.contains(communityId)) {
Context context = new Context();
context.setCode(communityId);
context.setLabel(communityMap.get(communityId));
Optional<List<DataInfo>> dataInfo = Optional.ofNullable(c.getDataInfo());
if (dataInfo.isPresent()) {
List<Provenance> provenance = new ArrayList<>();
provenance
.addAll(
dataInfo
.get()
.stream()
.map(
di -> Optional
.ofNullable(di.getProvenanceaction())
.map(
provenanceaction -> Provenance
.newInstance(
provenanceaction.getClassname(),
di.getTrust()))
.orElse(null))
.filter(Objects::nonNull)
.collect(Collectors.toSet()));
try {
context.setProvenance(getUniqueProvenance(provenance));
} catch (NoAvailableEntityTypeException e) {
e.printStackTrace();
}
}
return context;
}
return null;
})
.filter(Objects::nonNull)
.collect(Collectors.toList()))
.orElse(new ArrayList<>());
if (!contextList.isEmpty()) {
Set<Integer> hashValue = new HashSet<>();
List<Context> remainigContext = new ArrayList<>();
contextList.forEach(c -> {
if (!hashValue.contains(c.hashCode())) {
remainigContext.add(c);
hashValue.add(c.hashCode());
}
});
out.setContext(remainigContext);
}
}
private static void mapSubject(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getSubject()).isPresent()) {
out.setSubject(createSubjectMap(input));
out
.setKeywords(
input
.getSubject()
.stream()
.filter(
s -> s.getQualifier().getClassid().equalsIgnoreCase("keyword") &&
!s.getValue().equalsIgnoreCase("EOSC::RO-crate"))
.map(s -> s.getValue())
.collect(Collectors.toList()));
if (Optional.ofNullable(input.getEoscifguidelines()).isPresent()) {
out
.setEoscIF(
input
.getEoscifguidelines()
.stream()
.map(
eig -> EoscInteroperabilityFramework
.newInstance(
eig.getCode(), eig.getLabel(), eig.getUrl(),
eig.getSemanticRelation()))
.collect(Collectors.toList()));
}
}
}
private static void mapSource(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
Optional
.ofNullable(input.getSource())
.ifPresent(
value -> out.setSource(value.stream().map(Field::getValue).collect(Collectors.toList())));
}
private static void mapPublisher(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getPublisher()).isPresent()) {
out.setPublisher(input.getPublisher().getValue());
}
}
private static void mapAcceptanceDate(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getDateofacceptance()).isPresent()) {
out.setPublicationdate(input.getDateofacceptance().getValue());
}
}
private static void mapPid(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
Optional
.ofNullable(input.getPid())
.ifPresent(
value -> out
.setPid(
value
.stream()
.map(
p -> ResultPid
.newInstance(p.getQualifier().getClassid(), p.getValue()))
.collect(Collectors.toList())));
}
private static void mapTitle(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getTitle()).isPresent()) {
List<StructuredProperty> iTitle = input
.getTitle()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("main title"))
.collect(Collectors.toList());
if (!iTitle.isEmpty()) {
out.setMaintitle(iTitle.get(0).getValue());
}
iTitle = input
.getTitle()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("subtitle"))
.collect(Collectors.toList());
if (!iTitle.isEmpty()) {
out.setSubtitle(iTitle.get(0).getValue());
}
}
}
private static void mapLastUpdateTimestamp(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getLastupdatetimestamp()).isPresent()) {
out.setLastupdatetimestamp(input.getLastupdatetimestamp());
}
}
private static void mapLamguage(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getLanguage()).isPresent()) {
out
.setLanguage(
Language.newInstance(input.getLanguage().getClassid(), input.getLanguage().getClassname()));
}
}
private static void mapInstance(Result out, eu.dnetlib.dhp.schema.oaf.Result input,
List<MasterDuplicate> eoscIds) {
if (Optional
.ofNullable(input.getInstance())
.isPresent()) {
out
.setInstance(
input
.getInstance()
.stream()
.map(i -> getCommunityInstance(i, input.getResulttype().getClassid(), eoscIds))
.collect(Collectors.toList()));
}
}
private static void mapOriginalId(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
out.setOriginalId(new ArrayList<>());
Optional
.ofNullable(input.getOriginalId())
.ifPresent(
v -> out
.setOriginalId(
input
.getOriginalId()
.stream()
.filter(s -> !s.startsWith("50|"))
.collect(Collectors.toList())));
}
private static void mapFormat(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
final List<String> formatList = new ArrayList<>();
Optional
.ofNullable(input.getFormat())
.ifPresent(value -> value.forEach(f -> formatList.add(f.getValue())));
out.setFormat(formatList);
}
private static void mapMeasure(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getMeasures()).isPresent()) {
Indicator i = new Indicator();
UsageCounts uc = new UsageCounts();
input.getMeasures().forEach(m -> {
if (m.getId().equals("downloads")) {
uc.setDownloads(m.getUnit().get(0).getValue());
}
if (m.getId().equals("views")) {
uc.setViews(m.getUnit().get(0).getValue());
}
});
if (!uc.isEmpty()) {
i.setUsageCounts(uc);
out.setIndicator(i);
}
}
}
private static void mapEmbrargo(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
if (Optional.ofNullable(input.getEmbargoenddate()).isPresent()) {
out.setEmbargoenddate(input.getEmbargoenddate().getValue());
}
}
private static void mapDescription(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
final List<String> descriptionList = new ArrayList<>();
Optional
.ofNullable(input.getDescription())
.ifPresent(value -> value.forEach(d -> descriptionList.add(d.getValue())));
out.setDescription(descriptionList);
}
private static void mapCoverage(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
final List<String> coverageList = new ArrayList<>();
Optional
.ofNullable(input.getCoverage())
.ifPresent(value -> value.stream().forEach(c -> coverageList.add(c.getValue())));
out.setCoverage(coverageList);
}
private static void mapCountry(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
Optional
.ofNullable(input.getCountry())
.ifPresent(
value -> out
.setCountry(
value
.stream()
.map(
c -> {
if (c.getClassid().equals((ModelConstants.UNKNOWN))) {
return null;
}
ResultCountry country = new ResultCountry();
country.setCode(c.getClassid());
country.setLabel(c.getClassname());
Optional
.ofNullable(c.getDataInfo())
.ifPresent(
provenance -> country
.setProvenance(
Provenance
.newInstance(
provenance
.getProvenanceaction()
.getClassname(),
c.getDataInfo().getTrust())));
return country;
})
.filter(Objects::nonNull)
.collect(Collectors.toList())));
}
private static void mapContributor(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
final List<String> contributorList = new ArrayList<>();
Optional
.ofNullable(input.getContributor())
.ifPresent(value -> value.stream().forEach(c -> contributorList.add(c.getValue())));
out.setContributor(contributorList);
}
private static void mapAccessRight(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
// I do not map Access Right UNKNOWN or OTHER
Optional<Qualifier> oar = Optional.ofNullable(input.getBestaccessright());
if (oar.isPresent() && Constants.ACCESS_RIGHTS_COAR_MAP.containsKey(oar.get().getClassid())) {
String code = Constants.ACCESS_RIGHTS_COAR_MAP.get(oar.get().getClassid());
out
.setBestaccessright(
BestAccessRight
.newInstance(
code,
Constants.COAR_CODE_LABEL_MAP.get(code),
Constants.COAR_ACCESS_RIGHT_SCHEMA));
}
}
private static void mapAuthor(Result out, eu.dnetlib.dhp.schema.oaf.Result input) {
Optional
.ofNullable(input.getAuthor())
.ifPresent(
ats -> out.setAuthor(ats.stream().map(ResultMapper::getAuthor).collect(Collectors.toList())));
}
private static Map<String, List<eu.dnetlib.dhp.eosc.model.Subject>> createSubjectMap(
eu.dnetlib.dhp.schema.oaf.Result input) {
Map<String, List<eu.dnetlib.dhp.eosc.model.Subject>> map = new HashMap<>();
input.getSubject().stream().forEach(s -> {
String key = s.getQualifier().getClassid().toLowerCase();
if (!key.equalsIgnoreCase("http://www.abs.gov.au/ausstats/abs@.nsf/0/6BB427AB9696C225CA2574180004463E") &&
!key.equalsIgnoreCase("keyword") &&
!key.equalsIgnoreCase("eosc")) {
if (!map.containsKey(key)) {
map.put(key, new ArrayList<>());
}
eu.dnetlib.dhp.eosc.model.Subject subject = new eu.dnetlib.dhp.eosc.model.Subject();
subject.setValue(s.getValue());
Provenance p = getProvenance(s);
if (p != null) {
subject.setProvenance(p);
}
map.get(key).add(subject);
}
});
return map;
}
private static void addTypeSpecificInformation(Result out, eu.dnetlib.dhp.schema.oaf.Result input,
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> ort) throws NoAvailableEntityTypeException {
switch (ort.get().getClassid()) {
eu.dnetlib.dhp.schema.oaf.Qualifier ort) throws NoAvailableEntityTypeException {
switch (ort.getClassid()) {
case "publication":
Optional<Journal> journal = Optional
.ofNullable(((Publication) input).getJournal());
@ -332,6 +455,8 @@ public class ResultMapper implements Serializable {
out.setContainer(c);
out.setType(ModelConstants.PUBLICATION_DEFAULT_RESULTTYPE.getClassname());
}
if (Optional.ofNullable(((Publication) input).getFulltext()).isPresent())
mapFulltext(out, input);
break;
case "dataset":
Dataset id = (Dataset) input;
@ -404,15 +529,16 @@ public class ResultMapper implements Serializable {
.orElse(null));
out.setType(ModelConstants.ORP_DEFAULT_RESULTTYPE.getClassname());
if (Optional.ofNullable(((OtherResearchProduct) input).getFulltext()).isPresent())
mapFulltext(out, input);
break;
default:
throw new NoAvailableEntityTypeException();
}
}
private static Instance getGraphInstance(eu.dnetlib.dhp.schema.oaf.Instance i) {
Instance instance = new Instance();
private static eu.dnetlib.dhp.eosc.model.Instance getGraphInstance(eu.dnetlib.dhp.schema.oaf.Instance i) {
eu.dnetlib.dhp.eosc.model.Instance instance = new eu.dnetlib.dhp.eosc.model.Instance();
setCommonValue(i, instance);
@ -420,25 +546,48 @@ public class ResultMapper implements Serializable {
}
private static CommunityInstance getCommunityInstance(eu.dnetlib.dhp.schema.oaf.Instance i) {
CommunityInstance instance = new CommunityInstance();
private static eu.dnetlib.dhp.eosc.model.Instance getCommunityInstance(eu.dnetlib.dhp.schema.oaf.Instance i,
String resultType, List<MasterDuplicate> eoscIds) {
eu.dnetlib.dhp.eosc.model.Instance instance = new eu.dnetlib.dhp.eosc.model.Instance();
setCommonValue(i, instance);
instance
.setCollectedfrom(
CfHbKeyValue
.newInstance(i.getCollectedfrom().getKey(), i.getCollectedfrom().getValue()));
instance
.setHostedby(
CfHbKeyValue.newInstance(i.getHostedby().getKey(), i.getHostedby().getValue()));
List<MasterDuplicate> eoscDsIds = eoscIds
.stream()
.filter(
dm -> dm
.getGraphId()
.equals(i.getHostedby().getKey()) ||
dm
.getGraphId()
.equals(i.getCollectedfrom().getKey()))
.collect(Collectors.toList());
if (eoscDsIds.size() > 0) {
instance
.setEoscDsId(
eoscDsIds
.stream()
.map(dm -> dm.getEoscId())
.collect(Collectors.toList()));
}
if (resultType.equals("publication") ||
resultType.equals("other")) {
if (Optional.ofNullable(i.getFulltext()).isPresent())
instance.setFulltext(i.getFulltext());
}
return instance;
}
private static <I extends Instance> void setCommonValue(eu.dnetlib.dhp.schema.oaf.Instance i, I instance) {
private static void setCommonValue(eu.dnetlib.dhp.schema.oaf.Instance i,
eu.dnetlib.dhp.eosc.model.Instance instance) {
Optional<eu.dnetlib.dhp.schema.oaf.AccessRight> opAr = Optional.ofNullable(i.getAccessright());
if (opAr.isPresent() && Constants.ACCESS_RIGHTS_COAR_MAP.containsKey(opAr.get().getClassid())) {
@ -577,18 +726,15 @@ public class ResultMapper implements Serializable {
}
private static Subject getSubject(StructuredProperty s) {
Subject subject = new Subject();
subject.setSubject(SubjectSchemeValue.newInstance(s.getQualifier().getClassid(), s.getValue()));
private static Provenance getProvenance(StructuredProperty s) {
Optional<DataInfo> di = Optional.ofNullable(s.getDataInfo());
if (di.isPresent()) {
Provenance p = new Provenance();
p.setProvenance(di.get().getProvenanceaction().getClassname());
p.setTrust(di.get().getTrust());
subject.setProvenance(p);
return p;
}
return subject;
return null;
}
private static Author getAuthor(eu.dnetlib.dhp.schema.oaf.Author oa) {
@ -629,7 +775,8 @@ public class ResultMapper implements Serializable {
AuthorPidSchemeValue
.newInstance(
pid.getQualifier().getClassid(),
pid.getValue())
pid.getValue()),
null
);
}

View File

@ -9,9 +9,9 @@ import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.common.api.MissingConceptDoiException;
import eu.dnetlib.dhp.common.api.ZenodoAPIClient;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.NoAvailableEntityTypeException;
import eu.dnetlib.dhp.oa.zenodoapi.MissingConceptDoiException;
import eu.dnetlib.dhp.oa.zenodoapi.ZenodoAPIClient;
public class SendToZenodoHDFS implements Serializable {
@ -25,7 +25,7 @@ public class SendToZenodoHDFS implements Serializable {
.toString(
SendToZenodoHDFS.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/upload_zenodo.json")));
"/eu/dnetlib/dhp/oa/graph/dump/eosc_upload_zenodo.json")));
parser.parseArgument(args);
@ -83,8 +83,7 @@ public class SendToZenodoHDFS implements Serializable {
String name = pString.substring(pString.lastIndexOf("/") + 1);
FSDataInputStream inputStream = fileSystem.open(p);
zenodoApiClient.uploadIS(inputStream, name, fileStatus.getLen());
zenodoApiClient.uploadIS3(inputStream, name, fileSystem.getFileStatus(p).getLen());
}
}
@ -92,9 +91,9 @@ public class SendToZenodoHDFS implements Serializable {
zenodoApiClient.sendMretadata(metadata);
}
if (Boolean.TRUE.equals(publish)) {
zenodoApiClient.publish();
}
// if (Boolean.TRUE.equals(publish)) {
// zenodoApiClient.publish();
// }
}
}

View File

@ -0,0 +1,55 @@
package eu.dnetlib.dhp.oa.graph.dump;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.communityapi.model.*;
import eu.dnetlib.dhp.oa.graph.dump.eosc.CommunityMap;
public class UtilCommunityAPI {
private static final Logger log = LoggerFactory.getLogger(UtilCommunityAPI.class);
public CommunityMap getCommunityMap(boolean singleCommunity, String communityId)
throws IOException {
if (singleCommunity)
return getMap(Arrays.asList(getCommunity(communityId)));
return getMap(getValidCommunities());
}
private CommunityMap getMap(List<CommunityModel> communities) {
final CommunityMap map = new CommunityMap();
communities.forEach(c -> map.put(c.getId(), c.getName()));
return map;
}
private List<CommunityModel> getValidCommunities() throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper
.readValue(eu.dnetlib.dhp.communityapi.QueryCommunityAPI.communities(), CommunitySummary.class)
.stream()
.filter(
community -> (community.getStatus().equals("all") || community.getStatus().equalsIgnoreCase("public"))
&&
(community.getType().equals("ri") || community.getType().equals("community")))
.collect(Collectors.toList());
}
private CommunityModel getCommunity(String id) throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper
.readValue(eu.dnetlib.dhp.communityapi.QueryCommunityAPI.community(id), CommunityModel.class);
}
}

View File

@ -1,81 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.community;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.Optional;
import java.util.stream.Collectors;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.model.community.CommunityResult;
import eu.dnetlib.dhp.oa.model.community.Context;
/**
* This class splits the dumped results according to the research community - research initiative/infrastructure they
* are related to. The information about the community is found in the element "context.id" in the result. Since the
* context that can be found in the result can be associated not only to communities, a community Map is provided. It
* will guide the splitting process. Note: the repartition(1) just before writing the results related to a community.
* This is a choice due to uploading constraints (just one file for each community) As soon as a better solution will be
* in place remove the repartition
*/
public class CommunitySplit implements Serializable {
public void run(Boolean isSparkSessionManaged, String inputPath, String outputPath, String communityMapPath) {
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
CommunityMap communityMap = Utils.getCommunityMap(spark, communityMapPath);
execSplit(spark, inputPath, outputPath, communityMap);
});
}
private static void execSplit(SparkSession spark, String inputPath, String outputPath,
CommunityMap communities) {
Dataset<CommunityResult> result = Utils
.readPath(spark, inputPath + "/publication", CommunityResult.class)
.union(Utils.readPath(spark, inputPath + "/dataset", CommunityResult.class))
.union(Utils.readPath(spark, inputPath + "/orp", CommunityResult.class))
.union(Utils.readPath(spark, inputPath + "/software", CommunityResult.class));
communities
.keySet()
.stream()
.forEach(c -> printResult(c, result, outputPath + "/" + communities.get(c).replace(" ", "_")));
}
private static void printResult(String c, Dataset<CommunityResult> result, String outputPath) {
Dataset<CommunityResult> communityProducts = result
.filter((FilterFunction<CommunityResult>) r -> containsCommunity(r, c));
communityProducts
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
private static boolean containsCommunity(CommunityResult r, String c) {
if (Optional.ofNullable(r.getContext()).isPresent()) {
return r
.getContext()
.stream()
.map(Context::getCode)
.collect(Collectors.toList())
.contains(c);
}
return false;
}
}

View File

@ -1,67 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.community;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.DumpProducts;
import eu.dnetlib.dhp.oa.model.community.CommunityResult;
import eu.dnetlib.dhp.schema.oaf.Result;
/**
* Spark action to trigger the dump of results associated to research community - reseach initiative/infrasctructure The
* actual dump if performed via the class DumpProducts that is used also for the entire graph dump
*/
public class SparkDumpCommunityProducts implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkDumpCommunityProducts.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkDumpCommunityProducts.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final String resultClassName = parser.get("resultTableName");
log.info("resultTableName: {}", resultClassName);
String communityMapPath = parser.get("communityMapPath");
final String dumpType = Optional
.ofNullable(parser.get("dumpType"))
.map(String::valueOf)
.orElse("community");
Class<? extends Result> inputClazz = (Class<? extends Result>) Class.forName(resultClassName);
DumpProducts dump = new DumpProducts();
dump
.run(
isSparkSessionManaged, inputPath, outputPath, communityMapPath, inputClazz, CommunityResult.class,
dumpType);
}
}

View File

@ -1,50 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.community;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
/**
* Spark job to trigger the split of results associated to research community - reseach initiative/infrasctructure. The
* actual split is performed by the class CommunitySplit
*/
public class SparkSplitForCommunity implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkSplitForCommunity.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkSplitForCommunity.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/split_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final String communityMapPath = parser.get("communityMapPath");
CommunitySplit split = new CommunitySplit();
split.run(isSparkSessionManaged, inputPath, outputPath, communityMapPath);
}
}

View File

@ -1,84 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.Serializable;
import java.util.List;
/**
* Deserialization of the information in the context needed to create Context Entities, and relations between context
* entities and datasources and projects
*/
public class ContextInfo implements Serializable {
private String id;
private String description;
private String type;
private String zenodocommunity;
private String name;
private List<String> projectList;
private List<String> datasourceList;
private List<String> subject;
public List<String> getSubject() {
return subject;
}
public void setSubject(List<String> subject) {
this.subject = subject;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getZenodocommunity() {
return zenodocommunity;
}
public void setZenodocommunity(String zenodocommunity) {
this.zenodocommunity = zenodocommunity;
}
public List<String> getProjectList() {
return projectList;
}
public void setProjectList(List<String> projectList) {
this.projectList = projectList;
}
public List<String> getDatasourceList() {
return datasourceList;
}
public void setDatasourceList(List<String> datasourceList) {
this.datasourceList = datasourceList;
}
}

View File

@ -1,110 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Serializable;
import java.nio.charset.StandardCharsets;
import java.util.function.Consumer;
import java.util.function.Function;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.model.graph.ResearchInitiative;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpException;
/**
* Writes on HDFS Context entities. It queries the Information System at the lookup url provided as parameter and
* collects the general information for contexes of type community or ri. The general information is the id of the
* context, its label, the subjects associated to the context, its zenodo community, description and type. This
* information is used to create a new Context Entity
*/
public class CreateContextEntities implements Serializable {
private static final Logger log = LoggerFactory.getLogger(CreateContextEntities.class);
private final transient Configuration conf;
private final transient BufferedWriter writer;
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
CreateContextEntities.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_entity_parameter.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
final String hdfsPath = parser.get("hdfsPath");
log.info("hdfsPath: {}", hdfsPath);
final String hdfsNameNode = parser.get("nameNode");
log.info("nameNode: {}", hdfsNameNode);
final String isLookUpUrl = parser.get("isLookUpUrl");
log.info("isLookUpUrl: {}", isLookUpUrl);
final CreateContextEntities cce = new CreateContextEntities(hdfsPath, hdfsNameNode);
log.info("Processing contexts...");
cce.execute(Process::getEntity, isLookUpUrl);
cce.close();
}
private void close() throws IOException {
writer.close();
}
public CreateContextEntities(String hdfsPath, String hdfsNameNode) throws IOException {
this.conf = new Configuration();
this.conf.set("fs.defaultFS", hdfsNameNode);
FileSystem fileSystem = FileSystem.get(this.conf);
Path hdfsWritePath = new Path(hdfsPath);
FSDataOutputStream fsDataOutputStream = null;
if (fileSystem.exists(hdfsWritePath)) {
fsDataOutputStream = fileSystem.append(hdfsWritePath);
} else {
fsDataOutputStream = fileSystem.create(hdfsWritePath);
}
CompressionCodecFactory factory = new CompressionCodecFactory(conf);
CompressionCodec codec = factory.getCodecByClassName("org.apache.hadoop.io.compress.GzipCodec");
this.writer = new BufferedWriter(new OutputStreamWriter(codec.createOutputStream(fsDataOutputStream),
StandardCharsets.UTF_8));
}
public <R extends ResearchInitiative> void execute(final Function<ContextInfo, R> producer, String isLookUpUrl)
throws ISLookUpException {
QueryInformationSystem queryInformationSystem = new QueryInformationSystem();
queryInformationSystem.setIsLookUp(Utils.getIsLookUpService(isLookUpUrl));
final Consumer<ContextInfo> consumer = ci -> writeEntity(producer.apply(ci));
queryInformationSystem.getContextInformation(consumer);
}
protected <R extends ResearchInitiative> void writeEntity(final R r) {
try {
writer.write(Utils.OBJECT_MAPPER.writeValueAsString(r));
writer.newLine();
} catch (final IOException e) {
throw new IllegalArgumentException(e);
}
}
}

View File

@ -1,128 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Serializable;
import java.nio.charset.StandardCharsets;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.function.Consumer;
import java.util.function.Function;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.MyRuntimeException;
import eu.dnetlib.dhp.oa.model.graph.*;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.Datasource;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpException;
/**
* Writes the set of new Relation between the context and datasources. At the moment the relation between the context
* and the project is not created because of a low coverage in the profiles of openaire ids related to projects
*/
public class CreateContextRelation implements Serializable {
private static final Logger log = LoggerFactory.getLogger(CreateContextRelation.class);
private final transient Configuration conf;
private final transient BufferedWriter writer;
private final transient QueryInformationSystem queryInformationSystem;
private static final String CONTEX_RELATION_DATASOURCE = "contentproviders";
private static final String CONTEX_RELATION_PROJECT = "projects";
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
Objects
.requireNonNull(
CreateContextRelation.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_entity_parameter.json")));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String hdfsPath = parser.get("hdfsPath");
log.info("hdfsPath: {}", hdfsPath);
final String hdfsNameNode = parser.get("nameNode");
log.info("nameNode: {}", hdfsNameNode);
final String isLookUpUrl = parser.get("isLookUpUrl");
log.info("isLookUpUrl: {}", isLookUpUrl);
final CreateContextRelation cce = new CreateContextRelation(hdfsPath, hdfsNameNode, isLookUpUrl);
log.info("Creating relation for datasource...");
cce.execute(Process::getRelation, CONTEX_RELATION_DATASOURCE, ModelSupport.getIdPrefix(Datasource.class));
log.info("Creating relations for projects... ");
cce
.execute(
Process::getRelation, CONTEX_RELATION_PROJECT,
ModelSupport.getIdPrefix(eu.dnetlib.dhp.schema.oaf.Project.class));
cce.close();
}
private void close() throws IOException {
writer.close();
}
public CreateContextRelation(String hdfsPath, String hdfsNameNode, String isLookUpUrl)
throws IOException, ISLookUpException {
this.conf = new Configuration();
this.conf.set("fs.defaultFS", hdfsNameNode);
queryInformationSystem = new QueryInformationSystem();
queryInformationSystem.setIsLookUp(Utils.getIsLookUpService(isLookUpUrl));
queryInformationSystem.execContextRelationQuery();
FileSystem fileSystem = FileSystem.get(this.conf);
Path hdfsWritePath = new Path(hdfsPath);
FSDataOutputStream fsDataOutputStream = null;
if (fileSystem.exists(hdfsWritePath)) {
fsDataOutputStream = fileSystem.append(hdfsWritePath);
} else {
fsDataOutputStream = fileSystem.create(hdfsWritePath);
}
this.writer = new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8));
}
public void execute(final Function<ContextInfo, List<Relation>> producer, String category, String prefix) {
final Consumer<ContextInfo> consumer = ci -> producer.apply(ci).forEach(this::writeEntity);
queryInformationSystem.getContextRelation(consumer, category, prefix);
}
protected void writeEntity(final Relation r) {
try {
writer.write(Utils.OBJECT_MAPPER.writeValueAsString(r));
writer.newLine();
} catch (final Exception e) {
throw new MyRuntimeException(e);
}
}
}

View File

@ -1,518 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.io.StringReader;
import java.util.*;
import java.util.stream.Collectors;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
import eu.dnetlib.dhp.oa.graph.dump.DumpProducts;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.model.*;
import eu.dnetlib.dhp.oa.model.graph.*;
import eu.dnetlib.dhp.oa.model.graph.Funder;
import eu.dnetlib.dhp.oa.model.graph.Project;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.Field;
import eu.dnetlib.dhp.schema.oaf.Journal;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
/**
* Dumps of entities in the model defined in eu.dnetlib.dhp.schema.dump.oaf.graph. Results are dumped using the same
* Mapper as for eu.dnetlib.dhp.schema.dump.oaf.community, while for the other entities the mapping is defined below
*/
public class DumpGraphEntities implements Serializable {
public void run(Boolean isSparkSessionManaged,
String inputPath,
String outputPath,
Class<? extends OafEntity> inputClazz,
String communityMapPath) {
SparkConf conf = new SparkConf();
switch (ModelSupport.idPrefixMap.get(inputClazz)) {
case "50":
DumpProducts d = new DumpProducts();
d
.run(
isSparkSessionManaged, inputPath, outputPath, communityMapPath, inputClazz, GraphResult.class,
eu.dnetlib.dhp.oa.graph.dump.Constants.DUMPTYPE.COMPLETE.getType());
break;
case "40":
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
projectMap(spark, inputPath, outputPath, inputClazz);
});
break;
case "20":
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
organizationMap(spark, inputPath, outputPath, inputClazz);
});
break;
case "10":
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
datasourceMap(spark, inputPath, outputPath, inputClazz);
});
break;
}
}
private static <E extends OafEntity> void datasourceMap(SparkSession spark, String inputPath, String outputPath,
Class<E> inputClazz) {
Utils
.readPath(spark, inputPath, inputClazz)
.map(
(MapFunction<E, Datasource>) d -> mapDatasource((eu.dnetlib.dhp.schema.oaf.Datasource) d),
Encoders.bean(Datasource.class))
.filter(Objects::nonNull)
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath);
}
private static <E extends OafEntity> void projectMap(SparkSession spark, String inputPath, String outputPath,
Class<E> inputClazz) {
Utils
.readPath(spark, inputPath, inputClazz)
.map(
(MapFunction<E, Project>) p -> mapProject((eu.dnetlib.dhp.schema.oaf.Project) p),
Encoders.bean(Project.class))
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath);
}
private static Datasource mapDatasource(eu.dnetlib.dhp.schema.oaf.Datasource d) {
Datasource datasource = new Datasource();
datasource.setId(d.getId());
Optional
.ofNullable(d.getOriginalId())
.ifPresent(
oId -> datasource.setOriginalId(oId.stream().filter(Objects::nonNull).collect(Collectors.toList())));
Optional
.ofNullable(d.getPid())
.ifPresent(
pids -> datasource.setPid(pids
.stream()
.map(p -> DatasourcePid.newInstance(p.getQualifier().getClassid(), p.getValue()))
.collect(Collectors.toList())));
Optional
.ofNullable(d.getDatasourcetype())
.ifPresent(
dsType -> datasource
.setDatasourcetype(DatasourceSchemeValue.newInstance(dsType.getClassid(), dsType.getClassname())));
Optional
.ofNullable(d.getOpenairecompatibility())
.ifPresent(v -> datasource.setOpenairecompatibility(v.getClassname()));
Optional
.ofNullable(d.getOfficialname())
.ifPresent(oname -> datasource.setOfficialname(oname.getValue()));
Optional
.ofNullable(d.getEnglishname())
.ifPresent(ename -> datasource.setEnglishname(ename.getValue()));
Optional
.ofNullable(d.getWebsiteurl())
.ifPresent(wsite -> datasource.setWebsiteurl(wsite.getValue()));
Optional
.ofNullable(d.getLogourl())
.ifPresent(lurl -> datasource.setLogourl(lurl.getValue()));
Optional
.ofNullable(d.getDateofvalidation())
.ifPresent(dval -> datasource.setDateofvalidation(dval.getValue()));
Optional
.ofNullable(d.getDescription())
.ifPresent(dex -> datasource.setDescription(dex.getValue()));
Optional
.ofNullable(d.getSubjects())
.ifPresent(
sbjs -> datasource.setSubjects(sbjs.stream().map(sbj -> sbj.getValue()).collect(Collectors.toList())));
Optional
.ofNullable(d.getOdpolicies())
.ifPresent(odp -> datasource.setPolicies(Arrays.asList(odp.getValue())));
Optional
.ofNullable(d.getOdlanguages())
.ifPresent(
langs -> datasource
.setLanguages(langs.stream().map(lang -> lang.getValue()).collect(Collectors.toList())));
Optional
.ofNullable(d.getOdcontenttypes())
.ifPresent(
ctypes -> datasource
.setContenttypes(ctypes.stream().map(ctype -> ctype.getValue()).collect(Collectors.toList())));
Optional
.ofNullable(d.getReleasestartdate())
.ifPresent(rd -> datasource.setReleasestartdate(rd.getValue()));
Optional
.ofNullable(d.getReleaseenddate())
.ifPresent(ed -> datasource.setReleaseenddate(ed.getValue()));
Optional
.ofNullable(d.getMissionstatementurl())
.ifPresent(ms -> datasource.setMissionstatementurl(ms.getValue()));
Optional
.ofNullable(d.getDatabaseaccesstype())
.ifPresent(ar -> datasource.setAccessrights(ar.getValue()));
Optional
.ofNullable(d.getDatauploadtype())
.ifPresent(dut -> datasource.setUploadrights(dut.getValue()));
Optional
.ofNullable(d.getDatabaseaccessrestriction())
.ifPresent(dar -> datasource.setDatabaseaccessrestriction(dar.getValue()));
Optional
.ofNullable(d.getDatauploadrestriction())
.ifPresent(dur -> datasource.setDatauploadrestriction(dur.getValue()));
Optional
.ofNullable(d.getVersioning())
.ifPresent(v -> datasource.setVersioning(v.getValue()));
Optional
.ofNullable(d.getCitationguidelineurl())
.ifPresent(cu -> datasource.setCitationguidelineurl(cu.getValue()));
Optional
.ofNullable(d.getPidsystems())
.ifPresent(ps -> datasource.setPidsystems(ps.getValue()));
Optional
.ofNullable(d.getCertificates())
.ifPresent(c -> datasource.setCertificates(c.getValue()));
Optional
.ofNullable(d.getPolicies())
.ifPresent(ps -> datasource.setPolicies(ps.stream().map(p -> p.getValue()).collect(Collectors.toList())));
Optional
.ofNullable(d.getJournal())
.ifPresent(j -> datasource.setJournal(getContainer(j)));
return datasource;
}
private static Container getContainer(Journal j) {
Container c = new Container();
Optional
.ofNullable(j.getName())
.ifPresent(n -> c.setName(n));
Optional
.ofNullable(j.getIssnPrinted())
.ifPresent(issnp -> c.setIssnPrinted(issnp));
Optional
.ofNullable(j.getIssnOnline())
.ifPresent(issno -> c.setIssnOnline(issno));
Optional
.ofNullable(j.getIssnLinking())
.ifPresent(isnl -> c.setIssnLinking(isnl));
Optional
.ofNullable(j.getEp())
.ifPresent(ep -> c.setEp(ep));
Optional
.ofNullable(j.getIss())
.ifPresent(iss -> c.setIss(iss));
Optional
.ofNullable(j.getSp())
.ifPresent(sp -> c.setSp(sp));
Optional
.ofNullable(j.getVol())
.ifPresent(vol -> c.setVol(vol));
Optional
.ofNullable(j.getEdition())
.ifPresent(edition -> c.setEdition(edition));
Optional
.ofNullable(j.getConferencedate())
.ifPresent(cdate -> c.setConferencedate(cdate));
Optional
.ofNullable(j.getConferenceplace())
.ifPresent(cplace -> c.setConferenceplace(cplace));
return c;
}
private static Project mapProject(eu.dnetlib.dhp.schema.oaf.Project p) throws DocumentException {
Project project = new Project();
Optional
.ofNullable(p.getId())
.ifPresent(id -> project.setId(id));
Optional
.ofNullable(p.getWebsiteurl())
.ifPresent(w -> project.setWebsiteurl(w.getValue()));
Optional
.ofNullable(p.getCode())
.ifPresent(code -> project.setCode(code.getValue()));
Optional
.ofNullable(p.getAcronym())
.ifPresent(acronynim -> project.setAcronym(acronynim.getValue()));
Optional
.ofNullable(p.getTitle())
.ifPresent(title -> project.setTitle(title.getValue()));
Optional
.ofNullable(p.getStartdate())
.ifPresent(sdate -> project.setStartdate(sdate.getValue()));
Optional
.ofNullable(p.getEnddate())
.ifPresent(edate -> project.setEnddate(edate.getValue()));
Optional
.ofNullable(p.getCallidentifier())
.ifPresent(cide -> project.setCallidentifier(cide.getValue()));
Optional
.ofNullable(p.getKeywords())
.ifPresent(key -> project.setKeywords(key.getValue()));
Optional<Field<String>> omandate = Optional.ofNullable(p.getOamandatepublications());
Optional<Field<String>> oecsc39 = Optional.ofNullable(p.getEcsc39());
boolean mandate = false;
if (omandate.isPresent()) {
if (omandate.get().getValue().equals("true")) {
mandate = true;
}
}
if (oecsc39.isPresent()) {
if (oecsc39.get().getValue().equals("true")) {
mandate = true;
}
}
project.setOpenaccessmandateforpublications(mandate);
project.setOpenaccessmandatefordataset(false);
Optional
.ofNullable(p.getEcarticle29_3())
.ifPresent(oamandate -> project.setOpenaccessmandatefordataset(oamandate.getValue().equals("true")));
project
.setSubject(
Optional
.ofNullable(p.getSubjects())
.map(subjs -> subjs.stream().map(s -> s.getValue()).collect(Collectors.toList()))
.orElse(new ArrayList<>()));
Optional
.ofNullable(p.getSummary())
.ifPresent(summary -> project.setSummary(summary.getValue()));
Optional<Float> ofundedamount = Optional.ofNullable(p.getFundedamount());
Optional<Field<String>> ocurrency = Optional.ofNullable(p.getCurrency());
Optional<Float> ototalcost = Optional.ofNullable(p.getTotalcost());
if (ocurrency.isPresent()) {
if (ofundedamount.isPresent()) {
if (ototalcost.isPresent()) {
project
.setGranted(
Granted.newInstance(ocurrency.get().getValue(), ototalcost.get(), ofundedamount.get()));
} else {
project.setGranted(Granted.newInstance(ocurrency.get().getValue(), ofundedamount.get()));
}
}
}
project
.setH2020programme(
Optional
.ofNullable(p.getH2020classification())
.map(
classification -> classification
.stream()
.map(
c -> Programme
.newInstance(
c.getH2020Programme().getCode(), c.getH2020Programme().getDescription()))
.collect(Collectors.toList()))
.orElse(new ArrayList<>()));
Optional<List<Field<String>>> ofundTree = Optional
.ofNullable(p.getFundingtree());
List<Funder> funList = new ArrayList<>();
if (ofundTree.isPresent()) {
for (Field<String> fundingtree : ofundTree.get()) {
funList.add(getFunder(fundingtree.getValue()));
}
}
project.setFunding(funList);
return project;
}
public static Funder getFunder(String fundingtree) throws DocumentException {
Funder f = new Funder();
final Document doc;
doc = new SAXReader().read(new StringReader(fundingtree));
f.setShortName(((org.dom4j.Node) (doc.selectNodes("//funder/shortname").get(0))).getText());
f.setName(((org.dom4j.Node) (doc.selectNodes("//funder/name").get(0))).getText());
f.setJurisdiction(((org.dom4j.Node) (doc.selectNodes("//funder/jurisdiction").get(0))).getText());
// f.setId(((org.dom4j.Node) (doc.selectNodes("//funder/id").get(0))).getText());
String id = "";
String description = "";
// List<Levels> fundings = new ArrayList<>();
int level = 0;
List<org.dom4j.Node> nodes = doc.selectNodes("//funding_level_" + level);
while (nodes.size() > 0) {
for (org.dom4j.Node n : nodes) {
List node = n.selectNodes("./id");
id = ((org.dom4j.Node) node.get(0)).getText();
id = id.substring(id.indexOf("::") + 2);
node = n.selectNodes("./description");
description += ((Node) node.get(0)).getText() + " - ";
}
level += 1;
nodes = doc.selectNodes("//funding_level_" + level);
}
if (!id.equals("")) {
Fundings fundings = new Fundings();
fundings.setId(id);
fundings.setDescription(description.substring(0, description.length() - 3).trim());
f.setFunding_stream(fundings);
}
return f;
}
private static <E extends OafEntity> void organizationMap(SparkSession spark, String inputPath, String outputPath,
Class<E> inputClazz) {
Utils
.readPath(spark, inputPath, inputClazz)
.map(
(MapFunction<E, Organization>) o -> mapOrganization((eu.dnetlib.dhp.schema.oaf.Organization) o),
Encoders.bean(Organization.class))
.filter((FilterFunction<Organization>) o -> o != null)
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath);
}
private static eu.dnetlib.dhp.oa.model.graph.Organization mapOrganization(
eu.dnetlib.dhp.schema.oaf.Organization org) {
if (org.getDataInfo().getDeletedbyinference())
return null;
Organization organization = new Organization();
Optional
.ofNullable(org.getLegalshortname())
.ifPresent(value -> organization.setLegalshortname(value.getValue()));
Optional
.ofNullable(org.getLegalname())
.ifPresent(value -> organization.setLegalname(value.getValue()));
Optional
.ofNullable(org.getWebsiteurl())
.ifPresent(value -> organization.setWebsiteurl(value.getValue()));
Optional
.ofNullable(org.getAlternativeNames())
.ifPresent(
value -> organization
.setAlternativenames(
value
.stream()
.map(v -> v.getValue())
.collect(Collectors.toList())));
Optional
.ofNullable(org.getCountry())
.ifPresent(
value -> {
if (!value.getClassid().equals(Constants.UNKNOWN)) {
organization.setCountry(Country.newInstance(value.getClassid(), value.getClassname()));
}
});
Optional
.ofNullable(org.getId())
.ifPresent(value -> organization.setId(value));
Optional
.ofNullable(org.getPid())
.ifPresent(
value -> organization
.setPid(
value
.stream()
.map(p -> OrganizationPid.newInstance(p.getQualifier().getClassid(), p.getValue()))
.collect(Collectors.toList())));
return organization;
}
}

View File

@ -1,201 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.graph.dump.community.CommunityMap;
import eu.dnetlib.dhp.oa.model.Provenance;
import eu.dnetlib.dhp.oa.model.graph.Node;
import eu.dnetlib.dhp.oa.model.graph.RelType;
import eu.dnetlib.dhp.oa.model.graph.Relation;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.oaf.KeyValue;
import eu.dnetlib.dhp.schema.oaf.Result;
/**
* Creates new Relations (as in eu.dnetlib.dhp.schema.dump.oaf.graph.Relation) from the information in the Entity. The
* new Relations are created for the datasource in the collectedfrom and hostedby elements and for the context related
* to communities and research initiative/infrastructures. For collectedfrom elements it creates: datasource -> provides
* -> result and result -> isProvidedBy -> datasource For hostedby elements it creates: datasource -> hosts -> result
* and result -> isHostedBy -> datasource For context elements it creates: context <-> isRelatedTo <-> result. Note for
* context: it gets the first provenance in the dataInfo. If more than one is present the others are not dumped
*/
public class Extractor implements Serializable {
public void run(Boolean isSparkSessionManaged,
String inputPath,
String outputPath,
Class<? extends Result> inputClazz,
String communityMapPath) {
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
extractRelationResult(
spark, inputPath, outputPath, inputClazz, Utils.getCommunityMap(spark, communityMapPath));
});
}
private <R extends Result> void extractRelationResult(SparkSession spark,
String inputPath,
String outputPath,
Class<R> inputClazz,
CommunityMap communityMap) {
Set<Integer> hashCodes = new HashSet<>();
Utils
.readPath(spark, inputPath, inputClazz)
.flatMap((FlatMapFunction<R, Relation>) value -> {
List<Relation> relationList = new ArrayList<>();
extractRelationsFromInstance(hashCodes, value, relationList);
Set<String> communities = communityMap.keySet();
Optional
.ofNullable(value.getContext())
.ifPresent(contexts -> contexts.forEach(context -> {
String id = context.getId();
if (id.contains(":")) {
id = id.substring(0, id.indexOf(":"));
}
if (communities.contains(id)) {
String contextId = Utils.getContextId(id);
Provenance provenance = Optional
.ofNullable(context.getDataInfo())
.map(
dinfo -> Optional
.ofNullable(dinfo.get(0).getProvenanceaction())
.map(
paction -> Provenance
.newInstance(
paction.getClassid(),
dinfo.get(0).getTrust()))
.orElse(null))
.orElse(null);
Relation r = getRelation(
value.getId(), contextId,
Constants.RESULT_ENTITY,
Constants.CONTEXT_ENTITY,
ModelConstants.IS_RELATED_TO, ModelConstants.RELATIONSHIP, provenance);
if (!hashCodes.contains(r.hashCode())) {
relationList
.add(r);
hashCodes.add(r.hashCode());
}
r = getRelation(
contextId, value.getId(),
Constants.CONTEXT_ENTITY,
Constants.RESULT_ENTITY,
ModelConstants.IS_RELATED_TO,
ModelConstants.RELATIONSHIP, provenance);
if (!hashCodes.contains(r.hashCode())) {
relationList
.add(
r);
hashCodes.add(r.hashCode());
}
}
}));
return relationList.iterator();
}, Encoders.bean(Relation.class))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
private <R extends Result> void extractRelationsFromInstance(Set<Integer> hashCodes, R value,
List<Relation> relationList) {
Optional
.ofNullable(value.getInstance())
.ifPresent(inst -> inst.forEach(instance -> {
Optional
.ofNullable(instance.getCollectedfrom())
.ifPresent(
cf -> getRelatioPair(
value, relationList, cf,
ModelConstants.IS_PROVIDED_BY, ModelConstants.PROVIDES, hashCodes));
Optional
.ofNullable(instance.getHostedby())
.ifPresent(
hb -> getRelatioPair(
value, relationList, hb,
Constants.IS_HOSTED_BY, Constants.HOSTS, hashCodes));
}));
}
private static <R extends Result> void getRelatioPair(R value, List<Relation> relationList, KeyValue cf,
String resultDatasource, String datasourceResult,
Set<Integer> hashCodes) {
Provenance provenance = Optional
.ofNullable(cf.getDataInfo())
.map(
dinfo -> Optional
.ofNullable(dinfo.getProvenanceaction())
.map(
paction -> Provenance
.newInstance(
paction.getClassname(),
dinfo.getTrust()))
.orElse(
Provenance
.newInstance(
eu.dnetlib.dhp.oa.graph.dump.Constants.HARVESTED,
eu.dnetlib.dhp.oa.graph.dump.Constants.DEFAULT_TRUST)))
.orElse(
Provenance
.newInstance(
eu.dnetlib.dhp.oa.graph.dump.Constants.HARVESTED,
eu.dnetlib.dhp.oa.graph.dump.Constants.DEFAULT_TRUST));
Relation r = getRelation(
value.getId(),
cf.getKey(), Constants.RESULT_ENTITY, Constants.DATASOURCE_ENTITY,
resultDatasource, ModelConstants.PROVISION,
provenance);
if (!hashCodes.contains(r.hashCode())) {
relationList
.add(r);
hashCodes.add(r.hashCode());
}
r = getRelation(
cf.getKey(), value.getId(),
Constants.DATASOURCE_ENTITY, Constants.RESULT_ENTITY,
datasourceResult, ModelConstants.PROVISION,
provenance);
if (!hashCodes.contains(r.hashCode())) {
relationList
.add(r);
hashCodes.add(r.hashCode());
}
}
private static Relation getRelation(String source, String target, String sourceType, String targetType,
String relName, String relType, Provenance provenance) {
Relation r = new Relation();
r.setSource(Node.newInstance(source, sourceType));
r.setTarget(Node.newInstance(target, targetType));
r.setReltype(RelType.newInstance(relName, relType));
r.setProvenance(provenance);
return r;
}
}

View File

@ -1,25 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.Serializable;
public class MergedRels implements Serializable {
private String organizationId;
private String representativeId;
public String getOrganizationId() {
return organizationId;
}
public void setOrganizationId(String organizationId) {
this.organizationId = organizationId;
}
public String getRepresentativeId() {
return representativeId;
}
public void setRepresentativeId(String representativeId) {
this.representativeId = representativeId;
}
}

View File

@ -1,99 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.lang3.StringUtils;
import eu.dnetlib.dhp.oa.graph.dump.Constants;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.graph.dump.exceptions.MyRuntimeException;
import eu.dnetlib.dhp.oa.model.Provenance;
import eu.dnetlib.dhp.oa.model.graph.*;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.common.ModelSupport;
/**
* It process the ContextInfo information to produce a new Context Entity or a set of Relations between the generic
* context entity and datasource/projects related to the context.
*/
public class Process implements Serializable {
@SuppressWarnings("unchecked")
public static <R extends ResearchInitiative> R getEntity(ContextInfo ci) {
try {
ResearchInitiative ri;
if (ci.getType().equals("community")) {
ri = new ResearchCommunity();
((ResearchCommunity) ri).setSubject(ci.getSubject());
ri.setType(Constants.RESEARCH_COMMUNITY);
} else {
ri = new ResearchInitiative();
ri.setType(Constants.RESEARCH_INFRASTRUCTURE);
}
ri.setId(Utils.getContextId(ci.getId()));
ri.setAcronym(ci.getId());
ri.setDescription(ci.getDescription());
ri.setName(ci.getName());
if (StringUtils.isNotEmpty(ci.getZenodocommunity())) {
ri.setZenodo_community(Constants.ZENODO_COMMUNITY_PREFIX + ci.getZenodocommunity());
}
return (R) ri;
} catch (final Exception e) {
throw new MyRuntimeException(e);
}
}
public static List<Relation> getRelation(ContextInfo ci) {
try {
List<Relation> relationList = new ArrayList<>();
ci
.getDatasourceList()
.forEach(ds -> {
String nodeType = ModelSupport.idPrefixEntity.get(ds.substring(0, 2));
String contextId = Utils.getContextId(ci.getId());
relationList
.add(
Relation
.newInstance(
Node
.newInstance(
contextId, eu.dnetlib.dhp.oa.model.graph.Constants.CONTEXT_ENTITY),
Node.newInstance(ds, nodeType),
RelType.newInstance(ModelConstants.IS_RELATED_TO, ModelConstants.RELATIONSHIP),
Provenance
.newInstance(
Constants.USER_CLAIM,
Constants.DEFAULT_TRUST)));
relationList
.add(
Relation
.newInstance(
Node.newInstance(ds, nodeType),
Node
.newInstance(
contextId, eu.dnetlib.dhp.oa.model.graph.Constants.CONTEXT_ENTITY),
RelType.newInstance(ModelConstants.IS_RELATED_TO, ModelConstants.RELATIONSHIP),
Provenance
.newInstance(
Constants.USER_CLAIM,
Constants.DEFAULT_TRUST)));
});
return relationList;
} catch (final Exception e) {
throw new MyRuntimeException(e);
}
}
}

View File

@ -1,198 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.function.Consumer;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
import org.jetbrains.annotations.NotNull;
import org.xml.sax.SAXException;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.utils.DHPUtils;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpException;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpService;
public class QueryInformationSystem {
private ISLookUpService isLookUp;
private List<String> contextRelationResult;
private static final String XQUERY = "for $x in collection('/db/DRIVER/ContextDSResources/ContextDSResourceType') "
+
" where $x//CONFIGURATION/context[./@type='community' or ./@type='ri'] " +
" and $x//context/param[./@name = 'status']/text() = 'all' " +
" return " +
"$x//context";
private static final String XQUERY_ENTITY = "for $x in collection('/db/DRIVER/ContextDSResources/ContextDSResourceType') "
+
"where $x//context[./@type='community' or ./@type = 'ri'] and $x//context/param[./@name = 'status']/text() = 'all' return "
+
"concat(data($x//context/@id) , '@@', $x//context/param[./@name =\"name\"]/text(), '@@', " +
"$x//context/param[./@name=\"description\"]/text(), '@@', $x//context/param[./@name = \"subject\"]/text(), '@@', "
+
"$x//context/param[./@name = \"zenodoCommunity\"]/text(), '@@', $x//context/@type)";
public void getContextInformation(final Consumer<ContextInfo> consumer) throws ISLookUpException {
isLookUp
.quickSearchProfile(XQUERY_ENTITY)
.forEach(c -> {
ContextInfo cinfo = new ContextInfo();
String[] cSplit = c.split("@@");
cinfo.setId(cSplit[0]);
cinfo.setName(cSplit[1]);
cinfo.setDescription(cSplit[2]);
if (!cSplit[3].trim().equals("")) {
cinfo.setSubject(Arrays.asList(cSplit[3].split(",")));
}
cinfo.setZenodocommunity(cSplit[4]);
cinfo.setType(cSplit[5]);
consumer.accept(cinfo);
});
}
public List<String> getContextRelationResult() {
return contextRelationResult;
}
public void setContextRelationResult(List<String> contextRelationResult) {
this.contextRelationResult = contextRelationResult;
}
public ISLookUpService getIsLookUp() {
return isLookUp;
}
public void setIsLookUp(ISLookUpService isLookUpService) {
this.isLookUp = isLookUpService;
}
public void execContextRelationQuery() throws ISLookUpException {
contextRelationResult = isLookUp.quickSearchProfile(XQUERY);
}
public void getContextRelation(final Consumer<ContextInfo> consumer, String category, String prefix) {
contextRelationResult.forEach(xml -> {
ContextInfo cinfo = new ContextInfo();
final Document doc;
try {
final SAXReader reader = new SAXReader();
reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
doc = reader.read(new StringReader(xml));
Element root = doc.getRootElement();
cinfo.setId(root.attributeValue("id"));
Iterator<Element> it = root.elementIterator();
while (it.hasNext()) {
Element el = it.next();
if (el.getName().equals("category")) {
String categoryId = el.attributeValue("id");
categoryId = categoryId.substring(categoryId.lastIndexOf("::") + 2);
if (categoryId.equals(category)) {
cinfo.setDatasourceList(getCategoryList(el, prefix));
}
}
}
consumer.accept(cinfo);
} catch (DocumentException | SAXException e) {
e.printStackTrace();
}
});
}
@NotNull
private List<String> getCategoryList(Element el, String prefix) {
List<String> datasourceList = new ArrayList<>();
for (Object node : el.selectNodes(".//concept")) {
String oid = getOpenaireId((Node) node, prefix);
if (oid != null)
datasourceList.add(oid);
}
return datasourceList;
}
private String getOpenaireId(Node el, String prefix) {
for (Object node : el.selectNodes(".//param")) {
Node n = (Node) node;
if (n.valueOf("./@name").equals("openaireId")) {
return prefix + "|" + n.getText();
}
}
return makeOpenaireId(el, prefix);
}
private String makeOpenaireId(Node el, String prefix) {
if (!prefix.equals(ModelSupport.entityIdPrefix.get("project"))) {
return null;
}
String funder = "";
String grantId = null;
String funding = null;
for (Object node : el.selectNodes(".//param")) {
Node n = (Node) node;
switch (n.valueOf("./@name")) {
case "funding":
funding = n.getText();
break;
case "funder":
funder = n.getText();
break;
case "CD_PROJECT_NUMBER":
grantId = n.getText();
break;
default:
break;
}
}
String nsp = null;
switch (funder.toLowerCase()) {
case "ec":
if (funding == null) {
return null;
}
if (funding.toLowerCase().contains("h2020")) {
nsp = "corda__h2020::";
} else {
nsp = "corda_______::";
}
break;
case "tubitak":
nsp = "tubitakf____::";
break;
case "dfg":
nsp = "dfgf________::";
break;
default:
StringBuilder bld = new StringBuilder();
bld.append(funder.toLowerCase());
for (int i = funder.length(); i < 12; i++)
bld.append("_");
bld.append("::");
nsp = bld.toString();
}
return prefix + "|" + nsp + DHPUtils.md5(grantId);
}
}

View File

@ -1,122 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.model.graph.GraphResult;
import eu.dnetlib.dhp.oa.model.graph.Relation;
/**
* Reads all the entities of the same type (Relation / Results) and saves them in the same folder
*/
public class SparkCollectAndSave implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkCollectAndSave.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkCollectAndSave.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_collect_and_save.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final Boolean aggregateResult = Optional
.ofNullable(parser.get("resultAggregation"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath + "/result");
run(spark, inputPath, outputPath, aggregateResult);
});
}
private static void run(SparkSession spark, String inputPath, String outputPath, boolean aggregate) {
if (aggregate) {
Utils
.readPath(spark, inputPath + "/result/publication", GraphResult.class)
.union(Utils.readPath(spark, inputPath + "/result/dataset", GraphResult.class))
.union(Utils.readPath(spark, inputPath + "/result/otherresearchproduct", GraphResult.class))
.union(Utils.readPath(spark, inputPath + "/result/software", GraphResult.class))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath + "/result");
} else {
write(
Utils
.readPath(spark, inputPath + "/result/publication", GraphResult.class),
outputPath + "/publication");
write(
Utils
.readPath(spark, inputPath + "/result/dataset", GraphResult.class),
outputPath + "/dataset");
write(
Utils
.readPath(spark, inputPath + "/result/otherresearchproduct", GraphResult.class),
outputPath + "/otheresearchproduct");
write(
Utils
.readPath(spark, inputPath + "/result/software", GraphResult.class),
outputPath + "/software");
}
Utils
.readPath(spark, inputPath + "/relation/publication", Relation.class)
.union(Utils.readPath(spark, inputPath + "/relation/dataset", Relation.class))
.union(Utils.readPath(spark, inputPath + "/relation/orp", Relation.class))
.union(Utils.readPath(spark, inputPath + "/relation/software", Relation.class))
.union(Utils.readPath(spark, inputPath + "/relation/contextOrg", Relation.class))
.union(Utils.readPath(spark, inputPath + "/relation/context", Relation.class))
.union(Utils.readPath(spark, inputPath + "/relation/relation", Relation.class))
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath + "/relation");
}
private static void write(Dataset<GraphResult> dataSet, String outputPath) {
dataSet
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
}

View File

@ -1,54 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
/**
* Spark Job that fires the dump for the entites
*/
public class SparkDumpEntitiesJob implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkDumpEntitiesJob.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkDumpEntitiesJob.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final String resultClassName = parser.get("resultTableName");
log.info("resultTableName: {}", resultClassName);
final String communityMapPath = parser.get("communityMapPath");
Class<? extends OafEntity> inputClazz = (Class<? extends OafEntity>) Class.forName(resultClassName);
DumpGraphEntities dg = new DumpGraphEntities();
dg.run(isSparkSessionManaged, inputPath, outputPath, inputClazz, communityMapPath);
}
}

View File

@ -1,135 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.Collections;
import java.util.HashSet;
import java.util.Optional;
import java.util.Set;
import org.apache.commons.io.IOUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.model.Provenance;
import eu.dnetlib.dhp.oa.model.graph.Node;
import eu.dnetlib.dhp.oa.model.graph.RelType;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.DataInfo;
import eu.dnetlib.dhp.schema.oaf.Relation;
/**
* Dumps eu.dnetlib.dhp.schema.oaf.Relation in eu.dnetlib.dhp.schema.dump.oaf.graph.Relation
*/
public class SparkDumpRelationJob implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkDumpRelationJob.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkDumpRelationJob.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_relationdump_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
Optional<String> rs = Optional.ofNullable(parser.get("removeSet"));
final Set<String> removeSet = new HashSet<>();
if (rs.isPresent()) {
Collections.addAll(removeSet, rs.get().split(";"));
}
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
dumpRelation(spark, inputPath, outputPath, removeSet);
});
}
private static void dumpRelation(SparkSession spark, String inputPath, String outputPath, Set<String> removeSet) {
Dataset<Relation> relations = Utils.readPath(spark, inputPath, Relation.class);
relations
.filter((FilterFunction<Relation>) r -> !removeSet.contains(r.getRelClass()))
.map((MapFunction<Relation, eu.dnetlib.dhp.oa.model.graph.Relation>) relation -> {
eu.dnetlib.dhp.oa.model.graph.Relation relNew = new eu.dnetlib.dhp.oa.model.graph.Relation();
relNew
.setSource(
Node
.newInstance(
relation.getSource(),
ModelSupport.idPrefixEntity.get(relation.getSource().substring(0, 2))));
relNew
.setTarget(
Node
.newInstance(
relation.getTarget(),
ModelSupport.idPrefixEntity.get(relation.getTarget().substring(0, 2))));
relNew
.setReltype(
RelType
.newInstance(
relation.getRelClass(),
relation.getSubRelType()));
Optional<DataInfo> odInfo = Optional.ofNullable(relation.getDataInfo());
if (odInfo.isPresent()) {
DataInfo dInfo = odInfo.get();
if (Optional.ofNullable(dInfo.getProvenanceaction()).isPresent() &&
Optional.ofNullable(dInfo.getProvenanceaction().getClassname()).isPresent()) {
relNew
.setProvenance(
Provenance
.newInstance(
dInfo.getProvenanceaction().getClassname(),
dInfo.getTrust()));
}
}
if (Boolean.TRUE.equals(relation.getValidated())) {
relNew.setValidated(relation.getValidated());
relNew.setValidationDate(relation.getValidationDate());
}
return relNew;
}, Encoders.bean(eu.dnetlib.dhp.oa.model.graph.Relation.class))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
}

View File

@ -1,54 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.schema.oaf.Result;
/**
* Spark job that fires the extraction of relations from entities
*/
public class SparkExtractRelationFromEntities implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkExtractRelationFromEntities.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkExtractRelationFromEntities.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final String resultClassName = parser.get("resultTableName");
log.info("resultTableName: {}", resultClassName);
final String communityMapPath = parser.get("communityMapPath");
Class<? extends Result> inputClazz = (Class<? extends Result>) Class.forName(resultClassName);
Extractor extractor = new Extractor();
extractor.run(isSparkSessionManaged, inputPath, outputPath, inputClazz, communityMapPath);
}
}

View File

@ -1,179 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.function.Consumer;
import org.apache.commons.io.IOUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.*;
import org.jetbrains.annotations.NotNull;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.gson.Gson;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.oa.graph.dump.community.CommunityMap;
import eu.dnetlib.dhp.oa.model.Provenance;
import eu.dnetlib.dhp.oa.model.graph.Node;
import eu.dnetlib.dhp.oa.model.graph.RelType;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.Relation;
/**
* Create new Relations between Context Entities and Organizations whose products are associated to the context. It
* produces relation such as: organization <-> isRelatedTo <-> context
*/
public class SparkOrganizationRelation implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkOrganizationRelation.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkOrganizationRelation.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_organization_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final OrganizationMap organizationMap = new Gson()
.fromJson(parser.get("organizationCommunityMap"), OrganizationMap.class);
final String serializedOrganizationMap = new Gson().toJson(organizationMap);
log.info("organization map : {}", serializedOrganizationMap);
final String communityMapPath = parser.get("communityMapPath");
log.info("communityMapPath: {}", communityMapPath);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
extractRelation(spark, inputPath, organizationMap, outputPath, communityMapPath);
});
}
private static void extractRelation(SparkSession spark, String inputPath, OrganizationMap organizationMap,
String outputPath, String communityMapPath) {
CommunityMap communityMap = Utils.getCommunityMap(spark, communityMapPath);
Dataset<Relation> relationDataset = Utils.readPath(spark, inputPath, Relation.class);
relationDataset.createOrReplaceTempView("relation");
List<eu.dnetlib.dhp.oa.model.graph.Relation> relList = new ArrayList<>();
Dataset<MergedRels> mergedRelsDataset = spark
.sql(
"SELECT target organizationId, source representativeId " +
"FROM relation " +
"WHERE datainfo.deletedbyinference = false " +
"AND relclass = 'merges' " +
"AND substr(source, 1, 2) = '20'")
.as(Encoders.bean(MergedRels.class));
mergedRelsDataset.map((MapFunction<MergedRels, MergedRels>) mergedRels -> {
if (organizationMap.containsKey(mergedRels.getOrganizationId())) {
return mergedRels;
}
return null;
}, Encoders.bean(MergedRels.class))
.filter(Objects::nonNull)
.collectAsList()
.forEach(getMergedRelsConsumer(organizationMap, relList, communityMap));
organizationMap
.keySet()
.forEach(
oId -> organizationMap
.get(oId)
.forEach(community -> {
if (communityMap.containsKey(community)) {
addRelations(relList, community, oId);
}
}));
spark
.createDataset(relList, Encoders.bean(eu.dnetlib.dhp.oa.model.graph.Relation.class))
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(outputPath);
}
@NotNull
private static Consumer<MergedRels> getMergedRelsConsumer(OrganizationMap organizationMap,
List<eu.dnetlib.dhp.oa.model.graph.Relation> relList, CommunityMap communityMap) {
return mergedRels -> {
String oId = mergedRels.getOrganizationId();
organizationMap
.get(oId)
.forEach(community -> {
if (communityMap.containsKey(community)) {
addRelations(relList, community, mergedRels.getRepresentativeId());
}
});
organizationMap.remove(oId);
};
}
private static void addRelations(List<eu.dnetlib.dhp.oa.model.graph.Relation> relList, String community,
String organization) {
String id = Utils.getContextId(community);
log.info("create relation for organization: {}", organization);
relList
.add(
eu.dnetlib.dhp.oa.model.graph.Relation
.newInstance(
Node.newInstance(id, Constants.CONTEXT_ENTITY),
Node.newInstance(organization, ModelSupport.idPrefixEntity.get(organization.substring(0, 2))),
RelType.newInstance(ModelConstants.IS_RELATED_TO, ModelConstants.RELATIONSHIP),
Provenance
.newInstance(
eu.dnetlib.dhp.oa.graph.dump.Constants.USER_CLAIM,
eu.dnetlib.dhp.oa.graph.dump.Constants.DEFAULT_TRUST)));
relList
.add(
eu.dnetlib.dhp.oa.model.graph.Relation
.newInstance(
Node.newInstance(organization, ModelSupport.idPrefixEntity.get(organization.substring(0, 2))),
Node.newInstance(id, Constants.CONTEXT_ENTITY),
RelType.newInstance(ModelConstants.IS_RELATED_TO, ModelConstants.RELATIONSHIP),
Provenance
.newInstance(
eu.dnetlib.dhp.oa.graph.dump.Constants.USER_CLAIM,
eu.dnetlib.dhp.oa.graph.dump.Constants.DEFAULT_TRUST)));
}
}

View File

@ -1,136 +0,0 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.io.IOUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.oa.graph.dump.Utils;
import eu.dnetlib.dhp.schema.oaf.*;
/**
* It selects the valid relations among those present in the graph. One relation is valid if it is not deletedbyinference
* and if both the source and the target node are present in the graph and are not deleted by inference nor invisible.
* To check this I made a view of the ids of all the entities in the graph, and select the relations for which a join exists
* with this view for both the source and the target
*/
public class SparkSelectValidRelationsJob implements Serializable {
private static final Logger log = LoggerFactory.getLogger(SparkSelectValidRelationsJob.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SparkSelectValidRelationsJob.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/input_relationdump_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
selectValidRelation(spark, inputPath, outputPath);
});
}
private static void selectValidRelation(SparkSession spark, String inputPath, String outputPath) {
Dataset<Relation> relation = Utils.readPath(spark, inputPath + "/relation", Relation.class);
Dataset<Publication> publication = Utils.readPath(spark, inputPath + "/publication", Publication.class);
Dataset<eu.dnetlib.dhp.schema.oaf.Dataset> dataset = Utils
.readPath(spark, inputPath + "/dataset", eu.dnetlib.dhp.schema.oaf.Dataset.class);
Dataset<Software> software = Utils.readPath(spark, inputPath + "/software", Software.class);
Dataset<OtherResearchProduct> other = Utils
.readPath(spark, inputPath + "/otherresearchproduct", OtherResearchProduct.class);
Dataset<Organization> organization = Utils.readPath(spark, inputPath + "/organization", Organization.class);
Dataset<Project> project = Utils.readPath(spark, inputPath + "/project", Project.class);
Dataset<Datasource> datasource = Utils.readPath(spark, inputPath + "/datasource", Datasource.class);
relation.createOrReplaceTempView("relation");
publication.createOrReplaceTempView("publication");
dataset.createOrReplaceTempView("dataset");
other.createOrReplaceTempView("other");
software.createOrReplaceTempView("software");
organization.createOrReplaceTempView("organization");
project.createOrReplaceTempView("project");
datasource.createOrReplaceTempView("datasource");
spark
.sql(
"SELECT id " +
"FROM publication " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM dataset " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM other " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM software " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM organization " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM project " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false " +
"UNION ALL " +
"SELECT id " +
"FROM datasource " +
"WHERE datainfo.deletedbyinference = false AND datainfo.invisible = false ")
.createOrReplaceTempView("identifiers");
spark
.sql(
"SELECT relation.* " +
"FROM relation " +
"JOIN identifiers i1 " +
"ON source = i1.id " +
"JOIN identifiers i2 " +
"ON target = i2.id " +
"WHERE datainfo.deletedbyinference = false")
.as(Encoders.bean(Relation.class))
.write()
.option("compression", "gzip")
.mode(SaveMode.Overwrite)
.json(outputPath);
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.graph.dump.community;
package eu.dnetlib.dhp.oa.graph.dump.eosc;
import java.io.Serializable;
import java.util.HashMap;

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.graph.dump.complete;
package eu.dnetlib.dhp.oa.graph.dump.eosc;
import java.io.Serializable;

View File

@ -0,0 +1,102 @@
package eu.dnetlib.dhp.oa.graph.dump.eosc;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Serializable;
import java.nio.charset.StandardCharsets;
import java.util.Optional;
import javax.rmi.CORBA.Util;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.*;
import org.apache.spark.sql.catalyst.plans.logical.ColumnStat;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.schema.oaf.Datasource;
/**
* @author miriam.baglioni
* @Date 20/09/23
*/
public class EoscDatasourceId implements Serializable {
private static final Logger log = LoggerFactory.getLogger(EoscDatasourceId.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
SaveCommunityMap.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/eosc_identifiers_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String outputPath = parser.get("outputPath");
log.info("outputPath: {}", outputPath);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, outputPath);
mapEoscIdentifier(spark, inputPath, outputPath);
});
}
private static void mapEoscIdentifier(SparkSession spark, String inputPath, String outputPath) {
final StructType structureSchema = new StructType()
.add("masterId", DataTypes.StringType)
.add("masterName", DataTypes.StringType)
.add("duplicateId", DataTypes.StringType);
org.apache.spark.sql.Dataset<Row> df = spark.read().schema(structureSchema).parquet(inputPath);
// spark
// .read()
// .schema(Encoders.bean(Datasource.class).schema())
// .json(inputPath + "/datasource")
// .withColumn("orId", functions.explode(new Column("originalId")))
df
.filter(new Column("duplicateId").startsWith("eosc"))
.select(
new Column("masterId").as("graphId"),
functions.substring_index(new Column("duplicateId"), "::", -1).as("eoscId"))
.filter((FilterFunction<Row>) r -> !((String) r.getAs("graphId")).startsWith(("10|eosc")))
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.save(outputPath);
}
}

View File

@ -0,0 +1,47 @@
package eu.dnetlib.dhp.oa.graph.dump.eosc;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
/**
* @author miriam.baglioni
* @Date 25/09/23
*/
public class EoscMasterDuplicate {
private static final Logger log = LoggerFactory.getLogger(EoscMasterDuplicate.class);
public static void main(final String[] args) throws Exception {
final ArgumentApplicationParser parser = new ArgumentApplicationParser(
IOUtils
.toString(
EoscMasterDuplicate.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/datasourcemaster_parameters.json")));
parser.parseArgument(args);
final String dbUrl = parser.get("postgresUrl");
log.info("postgresUrl: {}", dbUrl);
final String dbUser = parser.get("postgresUser");
log.info("postgresUser: {}", dbUser);
final String dbPassword = parser.get("postgresPassword");
log.info("postgresPassword: {}", dbPassword);
final String hdfsPath = parser.get("hdfsPath");
log.info("hdfsPath: {}", hdfsPath);
final String hdfsNameNode = parser.get("hdfsNameNode");
log.info("hdfsNameNode: {}", hdfsNameNode);
ReadDatasourceMasterDuplicateFromDB.execute(dbUrl, dbUser, dbPassword, hdfsPath, hdfsNameNode);
}
}

View File

@ -0,0 +1,161 @@
package eu.dnetlib.dhp.oa.graph.dump.eosc;
import static eu.dnetlib.dhp.common.SparkSessionSupport.runWithSparkSession;
import java.io.Serializable;
import java.util.*;
import org.apache.commons.io.IOUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.api.java.function.MapGroupsFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import eu.dnetlib.dhp.application.ArgumentApplicationParser;
import eu.dnetlib.dhp.eosc.model.Affiliation;
import eu.dnetlib.dhp.eosc.model.OrganizationPid;
import eu.dnetlib.dhp.eosc.model.Result;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.oaf.Organization;
import eu.dnetlib.dhp.schema.oaf.Relation;
import scala.Tuple2;
/**
* @author miriam.baglioni
* @Date 27/07/22
*/
public class ExtendEoscResultWithOrganization implements Serializable {
private static final Logger log = LoggerFactory.getLogger(ExtendEoscResultWithOrganization.class);
public static void main(String[] args) throws Exception {
String jsonConfiguration = IOUtils
.toString(
ExtendEoscResultWithOrganization.class
.getResourceAsStream(
"/eu/dnetlib/dhp/oa/graph/dump/eosc_extend_result_with_organization_parameters.json"));
final ArgumentApplicationParser parser = new ArgumentApplicationParser(jsonConfiguration);
parser.parseArgument(args);
Boolean isSparkSessionManaged = Optional
.ofNullable(parser.get("isSparkSessionManaged"))
.map(Boolean::valueOf)
.orElse(Boolean.TRUE);
log.info("isSparkSessionManaged: {}", isSparkSessionManaged);
final String inputPath = parser.get("sourcePath");
log.info("inputPath: {}", inputPath);
final String workingPath = parser.get("workingPath");
log.info("workingPath: {}", workingPath);
SparkConf conf = new SparkConf();
runWithSparkSession(
conf,
isSparkSessionManaged,
spark -> {
Utils.removeOutputDir(spark, workingPath + "/affiliation");
addOrganizations(spark, inputPath, workingPath);
});
}
private static void addOrganizations(SparkSession spark, String inputPath, String workingPath) {
List<String> entities = Arrays.asList("publication", "dataset", "software", "otherresearchproduct");
entities
.parallelStream()
.forEach(
entity -> {
Dataset<Result> results = Utils
.readPath(spark, workingPath + "/" + entity, Result.class);
Dataset<Relation> relations = Utils
.readPath(spark, inputPath + "/relation", Relation.class)
.filter(
(FilterFunction<Relation>) r -> !r.getDataInfo().getDeletedbyinference() &&
!r.getDataInfo().getInvisible()
&& r.getSubRelType().equalsIgnoreCase(ModelConstants.AFFILIATION));
Dataset<Organization> organizations = Utils
.readPath(spark, inputPath + "/organization", Organization.class);
Dataset<ResultOrganizations> resultOrganization = relations
.joinWith(organizations, relations.col("source").equalTo(organizations.col("id")), "left")
.map((MapFunction<Tuple2<Relation, Organization>, ResultOrganizations>) t2 -> {
if (t2._2() != null) {
ResultOrganizations rOrg = new ResultOrganizations();
rOrg.setResultId(t2._1().getTarget());
Affiliation org = new Affiliation();
org.setId(t2._2().getId());
if (Optional.ofNullable(t2._2().getLegalname()).isPresent()) {
org.setName(t2._2().getLegalname().getValue());
} else {
org.setName("");
}
HashMap<String, Set<String>> organizationPids = new HashMap<>();
if (Optional.ofNullable(t2._2().getPid()).isPresent())
t2._2().getPid().forEach(p -> {
if (!organizationPids.containsKey(p.getQualifier().getClassid()))
organizationPids.put(p.getQualifier().getClassid(), new HashSet<>());
organizationPids.get(p.getQualifier().getClassid()).add(p.getValue());
});
List<OrganizationPid> pids = new ArrayList<>();
for (String key : organizationPids.keySet()) {
for (String value : organizationPids.get(key)) {
OrganizationPid pid = new OrganizationPid();
pid.setValue(value);
pid.setType(key);
pids.add(pid);
}
}
org.setPid(pids);
rOrg.setAffiliation(org);
return rOrg;
}
return null;
}, Encoders.bean(ResultOrganizations.class))
.filter(Objects::nonNull);
results
.joinWith(
resultOrganization, results.col("id").equalTo(resultOrganization.col("resultId")), "left")
.groupByKey(
(MapFunction<Tuple2<Result, ResultOrganizations>, String>) t2 -> t2._1().getId(),
Encoders.STRING())
.mapGroups(
(MapGroupsFunction<String, Tuple2<Result, ResultOrganizations>, Result>) (s, it) -> {
Tuple2<Result, ResultOrganizations> first = it.next();
if (first._2() == null) {
return first._1();
}
Result ret = first._1();
List<Affiliation> affiliation = new ArrayList<>();
Set<String> alreadyInsertedAffiliations = new HashSet<>();
affiliation.add(first._2().getAffiliation());
alreadyInsertedAffiliations.add(first._2().getAffiliation().getId());
it.forEachRemaining(res -> {
if (!alreadyInsertedAffiliations.contains(res._2().getAffiliation().getId())) {
affiliation.add(res._2().getAffiliation());
alreadyInsertedAffiliations.add(res._2().getAffiliation().getId());
}
});
ret.setAffiliation(affiliation);
return ret;
}, Encoders.bean(Result.class))
.write()
.mode(SaveMode.Overwrite)
.option("compression", "gzip")
.json(workingPath + "/affiliation/" + entity);
});
}
}

Some files were not shown because too many files have changed in this diff Show More