branch cities merged into master

This commit is contained in:
miconis 2019-04-03 12:22:33 +02:00
commit 3018031621
100 changed files with 71452 additions and 220 deletions

1
.gitignore vendored
View File

@ -28,6 +28,7 @@
*.iml
.DS_Store
**/.DS_Store
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

View File

@ -1,3 +1,72 @@
## Dnet-Dedup
...
# Decision Tree for authors deduplication
The decision tree has to be defined into the json configuration. The field decisionTree of the JSON contains a map organized as follow:
<String nodeName, TreeNodeDef treeNodeDef>: the nodeName is the key, the treeNodeDef contains the definition of the node.
In particular the TreeNodeDef contains:
- List of FieldConf : list of fields processed by the node. Each field is associated to:
- field: name of the field
- comparator: name of the comparator to use for that particular field, it produces a similarity score, -1 if the comparison is not possible (missing field or few informations).
> Each FieldConf contains a comparator name which has to be defined. It is sufficient to implement the Comparator interface that exposes a "compare" method returning the similarity score. The new comparator must be annotated with @ComparatorClass("name") specifying the name used by the FieldConf to access to the right comparator.
- weight: weight to assign to the similarity score of that comparator when aggregating
- params: list of parameters for the comparator
- threshold: this threshold is applied to the resulting similarity score of the particular treeNode.
```
if score>=th --- positive result
if score==-1 --- undefined result
if score<\th --- negative result
```
- aggregation: defines the type of aggregation to apply to the similarity scores of the fields in the list of fields
- possible values: AVG(average), MAX, MIN, SUM
- e.g. the similarity scores are multiplied with the weight and then the defined aggregation is applied
- arcs: define the next node of the tree depending on the result
- positive: specifies the key of the next node in case of positive result
- negative: specifies the key of the next node in case of negative result
- undefined: specifies the key of the next node in case of undefined result
- ignoreMissing: defines the behavior of the treeNode in case of a missing field
> e.g. if a comparator on a particular field produces an undefined result (-1), if ignoreMissing=true that field is simply ignored, otherwise the entire treeNode score is considered to be -1
In order to make the decision tree work, the BlockProcessor has been modified with the following changes:
- if the decision tree is defined into the JSON configuration the deduplication process relies on it
- if the decision tree is not defined the deduplication process is exactly like before (strict conditions, conditions, dinstance algos etc.)
# Cities and Keyword identification for organization deduplication
A new comparator (JaroWinklerNormalizedName) has been implemented for the deduplication of the organizations. This comparator identifies keywords and cities on the organization name and substitute them with particular codes.
To this aim, two different translation maps have been defined:
- translation_map.csv: contains keywords codes and the keyword in ~10 different languages
- city_map.csv: contains cities codes and city names in many different languages
> This csv files are placed into a map like that: <translation, code>. The key is the translation, the code is the one associated to the keyword/city.
The JaroWinklerNormalizedName comparator search for the keyword and the city name into the organization name, substitutes them with the code and then applies the JaroWinkler similarity function on the resulting strings removing identified codes.
The process to determine if two organization names are equal is the following:
```
if (sameCity(ca,cb)){
if (sameKeywords(ca,cb)){
ca = removeCodes(ca);
cb = removeCodes(cb);
if (ca.isEmpty() && cb.isEmpty())
return 1.0;
else
return normalize(ssalgo.score(ca,cb));
}
}
```
For the keyword replacement the process is simple: it is sufficient to divide the string into tokens (1 token=1 word) and search for that word into the translation map/
For the city name replacement the process is way more complicated: since we cannot know if the name of a particular city is composed by one or more words, we need to extract all the candidate names from the organization name.
The candidate city names are extracted basing on a window size (this is to limit the number of token extracted). All the candidates are composed by 4 or less adiacent words.
```
Example:
window = 4
organization name = University of technologies of New York
cleaned organization name (without stopwords and lowercased): university technologies new york
candidates = "university technologies new york", "university technologies new", "technologies new york", "university technologies", "technologies new", "new york", "university", "technologies", "new", "york"
```
These candidate names are searched into the city map starting from the longest until a name is found. When the name is present into the map, it is replaced with the city code.
powered by D-Net

252
dependencies.txt Normal file
View File

@ -0,0 +1,252 @@
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] dnet-dedup [pom]
[INFO] dnet-pace-core [jar]
[INFO] dnet-dedup-test [jar]
[INFO]
[INFO] -----------------------< eu.dnetlib:dnet-dedup >------------------------
[INFO] Building dnet-dedup 3.0.3-SNAPSHOT [1/3]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:3.0.0:tree (default-cli) @ dnet-dedup ---
[INFO] eu.dnetlib:dnet-dedup:pom:3.0.3-SNAPSHOT
[INFO]
[INFO] ---------------------< eu.dnetlib:dnet-pace-core >----------------------
[INFO] Building dnet-pace-core 3.0.3-SNAPSHOT [2/3]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:3.0.0:tree (default-cli) @ dnet-pace-core ---
[INFO] eu.dnetlib:dnet-pace-core:jar:3.0.3-SNAPSHOT
[INFO] +- edu.cmu:secondstring:jar:1.0.0:compile
[INFO] +- com.google.guava:guava:jar:15.0:compile
[INFO] +- com.google.code.gson:gson:jar:2.2.2:compile
[INFO] +- commons-lang:commons-lang:jar:2.6:compile
[INFO] +- commons-io:commons-io:jar:2.4:compile
[INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] +- com.googlecode.protobuf-java-format:protobuf-java-format:jar:1.2:compile
[INFO] +- org.antlr:stringtemplate:jar:3.2:compile
[INFO] | \- org.antlr:antlr:jar:2.7.7:compile
[INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] +- junit:junit:jar:4.9:test
[INFO] | \- org.hamcrest:hamcrest-core:jar:1.1:test
[INFO] +- org.reflections:reflections:jar:0.9.10:compile
[INFO] | +- org.javassist:javassist:jar:3.19.0-GA:compile
[INFO] | \- com.google.code.findbugs:annotations:jar:2.0.1:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.6:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile
[INFO] | \- com.fasterxml.jackson.core:jackson-core:jar:2.6.6:compile
[INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] \- org.apache.commons:commons-math3:jar:3.6.1:compile
[INFO]
[INFO] ---------------------< eu.dnetlib:dnet-dedup-test >---------------------
[INFO] Building dnet-dedup-test 3.0.3-SNAPSHOT [3/3]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:3.0.0:tree (default-cli) @ dnet-dedup-test ---
[INFO] eu.dnetlib:dnet-dedup-test:jar:3.0.3-SNAPSHOT
[INFO] +- eu.dnetlib:dnet-pace-core:jar:3.0.3-SNAPSHOT:compile
[INFO] | +- edu.cmu:secondstring:jar:1.0.0:compile
[INFO] | +- com.google.guava:guava:jar:15.0:compile
[INFO] | +- com.google.code.gson:gson:jar:2.2.2:compile
[INFO] | +- commons-lang:commons-lang:jar:2.6:compile
[INFO] | +- commons-io:commons-io:jar:2.4:compile
[INFO] | +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] | +- com.googlecode.protobuf-java-format:protobuf-java-format:jar:1.2:compile
[INFO] | +- org.antlr:stringtemplate:jar:3.2:compile
[INFO] | | \- org.antlr:antlr:jar:2.7.7:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] | +- org.reflections:reflections:jar:0.9.10:compile
[INFO] | | +- org.javassist:javassist:jar:3.19.0-GA:compile
[INFO] | | \- com.google.code.findbugs:annotations:jar:2.0.1:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.6:compile
[INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile
[INFO] | | \- com.fasterxml.jackson.core:jackson-core:jar:2.6.6:compile
[INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] | | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] | \- org.apache.commons:commons-math3:jar:3.6.1:compile
[INFO] +- eu.dnetlib:dnet-openaire-data-protos:jar:3.9.3-proto250:compile
[INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] | \- log4j:log4j:jar:1.2.17:compile (version selected from constraint [1.2.17,1.2.17])
[INFO] +- org.apache.spark:spark-core_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.avro:avro:jar:1.7.7:provided
[INFO] | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:provided
[INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:provided
[INFO] | | \- org.tukaani:xz:jar:1.0:provided
[INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided
[INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:provided
[INFO] | | \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided
[INFO] | +- com.twitter:chill_2.11:jar:0.8.0:provided
[INFO] | | \- com.esotericsoftware:kryo-shaded:jar:3.0.3:provided
[INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:provided
[INFO] | | \- org.objenesis:objenesis:jar:2.1:provided
[INFO] | +- com.twitter:chill-java:jar:0.8.0:provided
[INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided
[INFO] | +- org.apache.hadoop:hadoop-client:jar:2.6.5:provided
[INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.6.5:provided
[INFO] | | | +- commons-cli:commons-cli:jar:1.2:provided
[INFO] | | | +- xmlenc:xmlenc:jar:0.52:provided
[INFO] | | | +- commons-httpclient:commons-httpclient:jar:3.1:provided
[INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:provided
[INFO] | | | | +- commons-digester:commons-digester:jar:1.8:provided
[INFO] | | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:provided
[INFO] | | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:provided
[INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.6.5:provided
[INFO] | | | | \- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:provided
[INFO] | | | | +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:provided
[INFO] | | | | +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:provided
[INFO] | | | | \- org.apache.directory.api:api-util:jar:1.0.0-M20:provided
[INFO] | | | +- org.apache.curator:curator-client:jar:2.6.0:provided
[INFO] | | | \- org.htrace:htrace-core:jar:3.0.4:provided
[INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.6.5:provided
[INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:provided
[INFO] | | | \- xerces:xercesImpl:jar:2.9.1:provided
[INFO] | | | \- xml-apis:xml-apis:jar:1.3.04:provided
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.6.5:provided
[INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.6.5:provided
[INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.6.5:provided
[INFO] | | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.6.5:provided
[INFO] | | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.6.5:provided
[INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.6.5:provided
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.5:provided
[INFO] | | | \- org.apache.hadoop:hadoop-yarn-common:jar:2.6.5:provided
[INFO] | | | +- javax.xml.bind:jaxb-api:jar:2.2.2:provided
[INFO] | | | | \- javax.xml.stream:stax-api:jar:1.0-2:provided
[INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:provided
[INFO] | | | \- org.codehaus.jackson:jackson-xc:jar:1.9.13:provided
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.6.5:provided
[INFO] | | \- org.apache.hadoop:hadoop-annotations:jar:2.6.5:provided
[INFO] | +- org.apache.spark:spark-launcher_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.spark:spark-network-common_2.11:jar:2.2.0:provided
[INFO] | | \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:provided
[INFO] | +- org.apache.spark:spark-network-shuffle_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.spark:spark-unsafe_2.11:jar:2.2.0:provided
[INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.3:provided
[INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.3.3:provided
[INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.3.6:provided
[INFO] | | +- javax.activation:activation:jar:1.1.1:provided
[INFO] | | +- mx4j:mx4j:jar:3.0.2:provided
[INFO] | | +- javax.mail:mail:jar:1.4.7:provided
[INFO] | | +- org.bouncycastle:bcprov-jdk15on:jar:1.51:provided
[INFO] | | \- com.jamesmurty.utils:java-xmlbuilder:jar:1.0:provided
[INFO] | | \- net.iharder:base64:jar:2.3.8:provided
[INFO] | +- org.apache.curator:curator-recipes:jar:2.6.0:provided
[INFO] | | +- org.apache.curator:curator-framework:jar:2.6.0:provided
[INFO] | | \- org.apache.zookeeper:zookeeper:jar:3.4.6:provided
[INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:provided
[INFO] | +- org.apache.commons:commons-lang3:jar:3.5:provided
[INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:provided
[INFO] | +- org.slf4j:slf4j-api:jar:1.7.16:provided
[INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
[INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
[INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.16:provided
[INFO] | +- com.ning:compress-lzf:jar:1.0.3:provided
[INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.6:provided
[INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:provided
[INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided
[INFO] | +- commons-net:commons-net:jar:2.2:provided
[INFO] | +- org.scala-lang:scala-library:jar:2.11.8:provided
[INFO] | +- org.json4s:json4s-jackson_2.11:jar:3.2.11:provided
[INFO] | | \- org.json4s:json4s-core_2.11:jar:3.2.11:provided
[INFO] | | +- org.json4s:json4s-ast_2.11:jar:3.2.11:provided
[INFO] | | \- org.scala-lang:scalap:jar:2.11.0:provided
[INFO] | | \- org.scala-lang:scala-compiler:jar:2.11.0:provided
[INFO] | | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.1:provided
[INFO] | | \- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.1:provided
[INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:provided
[INFO] | | +- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:provided
[INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:provided
[INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:provided
[INFO] | | | \- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:provided
[INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:provided
[INFO] | | \- org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:provided
[INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:provided
[INFO] | | +- javax.annotation:javax.annotation-api:jar:1.2:provided
[INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:provided
[INFO] | | \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:provided
[INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:provided
[INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:provided
[INFO] | | \- javax.validation:validation-api:jar:1.1.0.Final:provided
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:provided
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:provided
[INFO] | +- io.netty:netty-all:jar:4.0.43.Final:provided
[INFO] | +- io.netty:netty:jar:3.9.9.Final:provided
[INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:provided
[INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:provided
[INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:provided
[INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:provided
[INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:provided
[INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.6.5:provided
[INFO] | | +- org.scala-lang:scala-reflect:jar:2.11.7:provided
[INFO] | | \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:provided
[INFO] | +- org.apache.ivy:ivy:jar:2.4.0:provided
[INFO] | +- oro:oro:jar:2.0.8:provided
[INFO] | +- net.razorvine:pyrolite:jar:4.13:provided
[INFO] | +- net.sf.py4j:py4j:jar:0.10.4:provided
[INFO] | +- org.apache.spark:spark-tags_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.commons:commons-crypto:jar:1.0.0:provided
[INFO] | \- org.spark-project.spark:unused:jar:1.0.0:provided
[INFO] +- org.apache.spark:spark-graphx_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.spark:spark-mllib-local_2.11:jar:2.2.0:provided
[INFO] | | \- org.scalanlp:breeze_2.11:jar:0.13.1:provided
[INFO] | | +- org.scalanlp:breeze-macros_2.11:jar:0.13.1:provided
[INFO] | | +- net.sf.opencsv:opencsv:jar:2.3:provided
[INFO] | | +- com.github.rwl:jtransforms:jar:2.4.0:provided
[INFO] | | +- org.spire-math:spire_2.11:jar:0.13.0:provided
[INFO] | | | +- org.spire-math:spire-macros_2.11:jar:0.13.0:provided
[INFO] | | | \- org.typelevel:machinist_2.11:jar:0.6.1:provided
[INFO] | | \- com.chuusai:shapeless_2.11:jar:2.3.2:provided
[INFO] | | \- org.typelevel:macro-compat_2.11:jar:1.1.1:provided
[INFO] | +- com.github.fommil.netlib:core:jar:1.1.2:provided
[INFO] | \- net.sourceforge.f2j:arpack_combined_all:jar:0.1:provided
[INFO] +- org.apache.spark:spark-sql_2.11:jar:2.2.0:provided
[INFO] | +- com.univocity:univocity-parsers:jar:2.2.1:provided
[INFO] | +- org.apache.spark:spark-sketch_2.11:jar:2.2.0:provided
[INFO] | +- org.apache.spark:spark-catalyst_2.11:jar:2.2.0:provided
[INFO] | | +- org.codehaus.janino:janino:jar:3.0.0:provided
[INFO] | | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided
[INFO] | | \- org.antlr:antlr4-runtime:jar:4.5.3:provided
[INFO] | +- org.apache.parquet:parquet-column:jar:1.8.2:provided
[INFO] | | +- org.apache.parquet:parquet-common:jar:1.8.2:provided
[INFO] | | \- org.apache.parquet:parquet-encoding:jar:1.8.2:provided
[INFO] | \- org.apache.parquet:parquet-hadoop:jar:1.8.2:provided
[INFO] | +- org.apache.parquet:parquet-format:jar:2.3.1:provided
[INFO] | \- org.apache.parquet:parquet-jackson:jar:1.8.2:provided
[INFO] +- eu.dnetlib:dnet-openaireplus-mapping-utils:jar:6.2.18:test
[INFO] | +- com.ximpleware:vtd-xml:jar:2.13.4:test (version selected from constraint [2.12,3.0.0))
[INFO] | +- commons-codec:commons-codec:jar:1.9:provided
[INFO] | +- dom4j:dom4j:jar:1.6.1:test (version selected from constraint [1.6.1,1.6.1])
[INFO] | +- net.sf.supercsv:super-csv:jar:2.4.0:test
[INFO] | +- eu.dnetlib:cnr-misc-utils:jar:1.0.6-SNAPSHOT:test (version selected from constraint [1.0.0,2.0.0))
[INFO] | | +- jaxen:jaxen:jar:1.1.6:test
[INFO] | | +- saxonica:saxon:jar:9.1.0.8:test
[INFO] | | +- saxonica:saxon-dom:jar:9.1.0.8:test
[INFO] | | +- jgrapht:jgrapht:jar:0.7.2:test
[INFO] | | +- net.sf.ehcache:ehcache:jar:2.8.0:test
[INFO] | | \- org.springframework:spring-test:jar:4.2.5.RELEASE:test (version selected from constraint [4.2.5.RELEASE,4.2.5.RELEASE])
[INFO] | | \- org.springframework:spring-core:jar:4.2.5.RELEASE:test
[INFO] | +- eu.dnetlib:dnet-hadoop-commons:jar:2.0.2-SNAPSHOT:test (version selected from constraint [2.0.0,3.0.0))
[INFO] | | +- org.apache.hadoop:hadoop-core:jar:2.0.0-mr1-cdh4.7.0:test
[INFO] | | | +- commons-el:commons-el:jar:1.0:test
[INFO] | | | \- hsqldb:hsqldb:jar:1.8.0.10:test
[INFO] | | \- org.springframework:spring-beans:jar:4.2.5.RELEASE:test (version selected from constraint [4.2.5.RELEASE,4.2.5.RELEASE])
[INFO] | \- eu.dnetlib:dnet-index-solr-common:jar:1.3.1:test (version selected from constraint [1.0.0,1.3.1])
[INFO] | \- org.apache.solr:solr-solrj:jar:4.9.0:test
[INFO] | +- org.apache.httpcomponents:httpmime:jar:4.3.1:test
[INFO] | \- org.noggit:noggit:jar:0.5:test
[INFO] \- junit:junit:jar:4.9:test
[INFO] \- org.hamcrest:hamcrest-core:jar:1.1:test
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] dnet-dedup 3.0.3-SNAPSHOT .......................... SUCCESS [ 1.152 s]
[INFO] dnet-pace-core ..................................... SUCCESS [ 0.117 s]
[INFO] dnet-dedup-test 3.0.3-SNAPSHOT ..................... SUCCESS [ 1.407 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.216 s
[INFO] Finished at: 2019-03-29T15:02:42+01:00
[INFO] ------------------------------------------------------------------------

View File

@ -23,6 +23,8 @@
<skip>true</skip>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
@ -30,11 +32,50 @@
<source>1.8</source>
<target>1.8</target>
<includes>
<include>src/main/java/**/*.java</include>
<include>src/main/java/**/*.scala</include>
<include>**/*.java</include>
</includes>
<!--<includes>-->
<!--<include>src/main/java/**/*.java</include>-->
<!--<include>src/main/java/**/*.scala</include>-->
<!--</includes>-->
</configuration>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>4.0.1</version>
<!--<executions>-->
<!--<execution>-->
<!--<goals>-->
<!--<goal>compile</goal>-->
<!--<goal>testCompile</goal>-->
<!--</goals>-->
<!--</execution>-->
<!--</executions>-->
<executions>
<execution>
<id>scala-compile-first</id>
<phase>initialize</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</build>
@ -60,6 +101,11 @@
<artifactId>spark-graphx_2.11</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
</dependency>
<dependency>
<groupId>eu.dnetlib</groupId>
<artifactId>dnet-openaireplus-mapping-utils</artifactId>
@ -72,6 +118,22 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.apache.oozie</groupId>
<artifactId>oozie-client</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</dependency>
</dependencies>
</project>

View File

@ -2,10 +2,11 @@ package eu.dnetlib;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.pace.model.MapDocument;
import eu.dnetlib.pace.util.PaceException;
import org.codehaus.jackson.annotate.JsonIgnore;
import java.io.IOException;
import java.io.Serializable;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
@ -50,6 +51,7 @@ public class ConnectedComponent implements Serializable {
}
}
@JsonIgnore
public String getMin(List<String> ids){
String min = ids.get(0);
@ -67,7 +69,7 @@ public class ConnectedComponent implements Serializable {
try {
return mapper.writeValueAsString(this);
} catch (IOException e) {
return null;
throw new PaceException("Failed to create Json: ", e);
}
}
}

View File

@ -0,0 +1,115 @@
package eu.dnetlib;
import eu.dnetlib.graph.GraphProcessor;
import eu.dnetlib.pace.config.DedupConfig;
import eu.dnetlib.pace.model.MapDocument;
import eu.dnetlib.pace.util.BlockProcessor;
import eu.dnetlib.pace.utils.PaceUtils;
import eu.dnetlib.reporter.SparkCounter;
import eu.dnetlib.reporter.SparkReporter;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.graphx.Edge;
import org.apache.spark.rdd.RDD;
import org.apache.spark.sql.SparkSession;
import scala.Tuple2;
import java.net.URL;
import java.util.stream.Collectors;
public class SparkLocalTest {
public static SparkCounter counter ;
public static void main(String[] args) {
final SparkSession spark = SparkSession
.builder()
.appName("Deduplication")
.master("local[*]")
.getOrCreate();
final JavaSparkContext context = new JavaSparkContext(spark.sparkContext());
final URL dataset = SparkTest.class.getResource("/eu/dnetlib/pace/organization.to.fix.json");
final JavaRDD<String> dataRDD = context.textFile(dataset.getPath());
counter = new SparkCounter(context);
//read the configuration from the classpath
final DedupConfig config = DedupConfig.load(Utility.readFromClasspath("/eu/dnetlib/pace/org.curr.conf"));
BlockProcessor.constructAccumulator(config);
BlockProcessor.accumulators.forEach(acc -> {
final String[] values = acc.split("::");
counter.incrementCounter(values[0], values[1], 0);
});
//create vertexes of the graph: <ID, MapDocument>
JavaPairRDD<String, MapDocument> mapDocs = dataRDD.mapToPair(it -> {
MapDocument mapDocument = PaceUtils.asMapDocument(config, it);
return new Tuple2<>(mapDocument.getIdentifier(), mapDocument);
});
RDD<Tuple2<Object, MapDocument>> vertexes = mapDocs.mapToPair(t -> new Tuple2<Object, MapDocument>( (long) t._1().hashCode(), t._2())).rdd();
//create relations between documents
JavaPairRDD<String, Iterable<MapDocument>> blocks = mapDocs.reduceByKey((a, b) -> a) //the reduce is just to be sure that we haven't document with same id
//Clustering: from <id, doc> to List<groupkey,doc>
.flatMapToPair(a -> {
final MapDocument currentDocument = a._2();
return Utility.getGroupingKeys(config, currentDocument).stream()
.map(it -> new Tuple2<>(it, currentDocument)).collect(Collectors.toList()).iterator();
}).groupByKey();//group documents basing on the key
//print blocks
blocks.foreach(b -> {
String print = b._1() + ": ";
for (MapDocument doc : b._2()) {
print += doc.getIdentifier() + " ";
}
System.out.println(print);
});
//create relations by comparing only elements in the same group
final JavaPairRDD<String, String> relationRDD = blocks.flatMapToPair(it -> {
final SparkReporter reporter = new SparkReporter(counter);
new BlockProcessor(config).process(it._1(), it._2(), reporter);
return reporter.getReport().iterator();
});
final RDD<Edge<String>> edgeRdd = relationRDD.map(it -> new Edge<>(it._1().hashCode(),it._2().hashCode(), "similarTo")).rdd();
JavaRDD<ConnectedComponent> ccs = GraphProcessor.findCCs(vertexes, edgeRdd, 20).toJavaRDD();
final JavaRDD<ConnectedComponent> connectedComponents = ccs.filter(cc -> cc.getDocs().size()>1);
final JavaRDD<ConnectedComponent> nonDeduplicated = ccs.filter(cc -> cc.getDocs().size()==1);
System.out.println("Non duplicates: " + nonDeduplicated.count());
System.out.println("Duplicates: " + connectedComponents.flatMap(cc -> cc.getDocs().iterator()).count());
System.out.println("Connected Components: " + connectedComponents.count());
counter.getAccumulators().values().forEach(it-> System.out.println(it.getGroup()+" "+it.getName()+" -->"+it.value()));
//print deduped
connectedComponents.foreach(cc -> {
System.out.println("cc = " + cc.getId());
for (MapDocument doc: cc.getDocs()) {
System.out.println(doc.getIdentifier() + "; ln: " + doc.getFieldMap().get("legalname").stringValue() + "; sn: " + doc.getFieldMap().get("legalshortname").stringValue());
}
});
//print nondeduped
nonDeduplicated.foreach(cc -> {
System.out.println("nd = " + cc.getId());
System.out.println(cc.getDocs().iterator().next().getFieldMap().get("legalname").stringValue() + "; sn: " + cc.getDocs().iterator().next().getFieldMap().get("legalshortname").stringValue());
});
//print ids
//// ccs.foreach(cc -> System.out.println(cc.getId()));
//// connectedComponents.saveAsTextFile("file:///Users/miconis/Downloads/dumps/organizations_dedup");
}
}

View File

@ -1,45 +1,41 @@
package eu.dnetlib;
import com.google.common.collect.Sets;
import eu.dnetlib.graph.GraphProcessor;
import eu.dnetlib.pace.clustering.BlacklistAwareClusteringCombiner;
import eu.dnetlib.pace.config.DedupConfig;
import eu.dnetlib.pace.model.MapDocument;
import eu.dnetlib.pace.util.BlockProcessor;
import eu.dnetlib.pace.utils.PaceUtils;
import eu.dnetlib.reporter.SparkCounter;
import eu.dnetlib.reporter.SparkReporter;
import org.apache.commons.io.IOUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.graphx.Edge;
import org.apache.spark.rdd.RDD;
import org.apache.spark.sql.SparkSession;
import scala.Tuple2;
import java.io.IOException;
import java.io.StringWriter;
import java.net.URL;
import java.util.Set;
import java.util.stream.Collectors;
public class SparkTest {
public static SparkCounter counter ;
private static final Log log = LogFactory.getLog(SparkTest.class);
public static void main(String[] args) {
final JavaSparkContext context = new JavaSparkContext(new SparkConf().setAppName("Deduplication").setMaster("local[*]"));
public static void main(String[] args) throws IOException {
final URL dataset = SparkTest.class.getResource("/eu/dnetlib/pace/result.title.stackoverflow.json");
final JavaRDD<String> dataRDD = context.textFile(dataset.getPath());
final SparkSession spark = SparkSession
.builder()
.appName("Deduplication")
.master("yarn")
.getOrCreate();
final JavaSparkContext context = new JavaSparkContext(spark.sparkContext());
final JavaRDD<String> dataRDD = Utility.loadDataFromHDFS(args[0], context);
counter = new SparkCounter(context);
//read the configuration from the classpath
final DedupConfig config = DedupConfig.load(readFromClasspath("/eu/dnetlib/pace/result.full.pace.conf"));
final DedupConfig config = Utility.loadConfigFromHDFS(args[1]);
BlockProcessor.constructAccumulator(config);
BlockProcessor.accumulators.forEach(acc -> {
@ -61,15 +57,22 @@ public class SparkTest {
//Clustering: from <id, doc> to List<groupkey,doc>
.flatMapToPair(a -> {
final MapDocument currentDocument = a._2();
return getGroupingKeys(config, currentDocument).stream()
return Utility.getGroupingKeys(config, currentDocument).stream()
.map(it -> new Tuple2<>(it, currentDocument)).collect(Collectors.toList()).iterator();
}).groupByKey(); //group documents basing on the key
}).groupByKey();//group documents basing on the key
log.info("blocks to process: " + blocks.count());
//print blocks
blocks.foreach(b -> {
String print = b._1() + ": ";
for (MapDocument doc : b._2()) {
print += doc.getIdentifier() + " ";
}
System.out.println(print);
});
final JavaPairRDD<String, String> relationRDD = blocks
//create relations by comparing only elements in the same group
.flatMapToPair(it -> {
//create relations by comparing only elements in the same group
final JavaPairRDD<String, String> relationRDD = blocks.flatMapToPair(it -> {
final SparkReporter reporter = new SparkReporter(counter);
new BlockProcessor(config).process(it._1(), it._2(), reporter);
return reporter.getReport().iterator();
@ -88,29 +91,23 @@ public class SparkTest {
counter.getAccumulators().values().forEach(it-> System.out.println(it.getGroup()+" "+it.getName()+" -->"+it.value()));
connectedComponents.foreach(cc -> System.out.println("cc = " + cc.toString() + " size =" + cc.getDocs().size()));
nonDeduplicated.foreach(cc -> System.out.println("nd = " + cc.toString()));
//print deduped
connectedComponents.foreach(cc -> {
System.out.println("cc = " + cc.getId());
for (MapDocument doc: cc.getDocs()) {
System.out.println(doc.getIdentifier() + "; ln: " + doc.getFieldMap().get("legalname").stringValue() + "; sn: " + doc.getFieldMap().get("legalshortname").stringValue());
}
});
//print nondeduped
nonDeduplicated.foreach(cc -> {
System.out.println("nd = " + cc.getId());
System.out.println(cc.getDocs().iterator().next().getFieldMap().get("legalname").stringValue() + "; sn: " + cc.getDocs().iterator().next().getFieldMap().get("legalshortname").stringValue());
});
//print ids
// print ids
// ccs.foreach(cc -> System.out.println(cc.getId()));
// ccs.saveAsTextFile("file:///Users/miconis/Downloads/dumps/organizations_dedup");
// connectedComponents.saveAsTextFile("file:///Users/miconis/Downloads/dumps/organizations_dedup");
}
static String readFromClasspath(final String filename) {
final StringWriter sw = new StringWriter();
try {
IOUtils.copy(SparkTest.class.getResourceAsStream(filename), sw);
return sw.toString();
} catch (final IOException e) {
throw new RuntimeException("cannot load resource from classpath: " + filename);
}
}
static Set<String> getGroupingKeys(DedupConfig conf, MapDocument doc) {
return Sets.newHashSet(BlacklistAwareClusteringCombiner.filterAndCombine(doc, conf));
}
}
}

View File

@ -0,0 +1,50 @@
package eu.dnetlib;
import com.google.common.collect.Sets;
import eu.dnetlib.pace.clustering.BlacklistAwareClusteringCombiner;
import eu.dnetlib.pace.config.DedupConfig;
import eu.dnetlib.pace.model.MapDocument;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.io.IOException;
import java.io.StringWriter;
import java.nio.charset.StandardCharsets;
import java.util.Set;
public class Utility {
public static JavaRDD<String> loadDataFromHDFS(String path, JavaSparkContext context) {
return context.textFile(path);
}
public static DedupConfig loadConfigFromHDFS(String path) throws IOException {
Configuration conf = new Configuration();
// conf.set("fs.defaultFS", "");
FileSystem fileSystem = FileSystem.get(conf);
FSDataInputStream inputStream = new FSDataInputStream(fileSystem.open(new Path(path)));
return DedupConfig.load(IOUtils.toString(inputStream, StandardCharsets.UTF_8.name()));
}
static String readFromClasspath(final String filename) {
final StringWriter sw = new StringWriter();
try {
IOUtils.copy(SparkTest.class.getResourceAsStream(filename), sw);
return sw.toString();
} catch (final IOException e) {
throw new RuntimeException("cannot load resource from classpath: " + filename);
}
}
static Set<String> getGroupingKeys(DedupConfig conf, MapDocument doc) {
return Sets.newHashSet(BlacklistAwareClusteringCombiner.filterAndCombine(doc, conf));
}
}

View File

@ -32,8 +32,7 @@ public class OAFProtoUtils {
}
public static FieldTypeProtos.Qualifier.Builder getQualifier(final String classname, final String schemename) {
return
FieldTypeProtos.Qualifier.newBuilder().setClassid(classname).setClassname(classname).setSchemeid(schemename).setSchemename(schemename);
return FieldTypeProtos.Qualifier.newBuilder().setClassid(classname).setClassname(classname).setSchemeid(schemename).setSchemename(schemename);
}
public static OafProtos.OafEntity.Builder oafEntity(final String id, final eu.dnetlib.data.proto.TypeProtos.Type type) {

View File

@ -0,0 +1,2 @@
{ "type": 30, "id": "30|author::id1", "person": { "metadata":{"orcid": "orcid1", "fullname": "smith, john", "firstname": "john", "lastname": "smith", "pubID": "pubid1", "pubDOI": "pubdoi1", "coauthors": ["la bruzzo, sandro", "atzori, claudio", "baglioni, miriam", "bardi, alessia"], "topics": [0.0,0.0,0.0], "rank":1, "area":"1"}}}
{ "type": 30, "id": "30|author::id2", "person": { "metadata":{"orcid": "", "fullname": "smith, john", "firstname": "john", "lastname": "smith", "pubID": "pubid2", "pubDOI": "pubdoi2", "coauthors": ["la bruzzo, sandro", "atzori, claudio", "baglioni, miriam", "bardi, alessia"], "topics": [0.0,0.0,0.0], "rank":3, "area":"1"}}}

View File

@ -0,0 +1,40 @@
{
"wf" : {
"threshold" : "0.99",
"dedupRun" : "001",
"entityType" : "person",
"orderField" : "fullname",
"queueMaxSize" : "2000",
"groupMaxSize" : "10",
"slidingWindowSize" : "200",
"rootBuilder" : [ "person" ],
"includeChildren" : "true"
},
"pace": {
"clustering": [
{"name": "personClustering", "fields": ["fullname"], "params": {}}
],
"conditions": [],
"decisionTree": {
"start": {"fields": [{"field":"pubID", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"NO_MATCH", "negative":"layer2", "undefined": "layer2", "ignoreMissing": "false"},
"layer2": {"fields": [{"field":"orcid", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"ORCID_MATCH", "negative":"NO_MATCH", "undefined": "layer3", "ignoreMissing": "false"},
"layer3": {"fields": [{"field":"firstname", "comparator":"similar", "weight":1.0, "params":{}}], "threshold":0.7, "aggregation": "SUM", "positive":"layer4", "negative":"NO_MATCH", "undefined": "layer4", "ignoreMissing": "false"},
"layer4": {"fields": [{"field":"coauthors", "comparator":"coauthorsMatch", "weight":1.0, "params":{"minCoauthors":6, "maxCoauthors": 200}}], "threshold":5.0, "aggregation": "SUM", "positive":"COAUTHORS_MATCH", "negative":"NO_MATCH", "undefined": "layer5", "ignoreMissing": "false"},
"layer5": {"fields": [{"field":"area", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"layer6", "negative":"NO_MATCH", "undefined": "NO_MATCH", "ignoreMissing": "false"},
"layer6": {"fields": [{"field":"topics", "comparator":"topicsMatch", "weight":1.0, "params":{}}], "threshold":0.7, "aggregation": "SUM", "positive":"TOPICS_MATCH", "negative":"NO_MATCH", "undefined": "NO_MATCH", "ignoreMissing": "false"}
},
"model": [
{"name": "fullname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/fullname"},
{"name": "firstname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/firstname"},
{"name": "lastname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/lastname"},
{"name": "coauthors", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/coauthors"},
{"name": "orcid", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/orcid"},
{"name": "topics", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/topics"},
{"name": "pubID", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/pubID"},
{"name": "pubDOI", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/pubDOI"},
{"name": "rank", "algo": "Null", "type": "Int", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/rank"},
{"name": "area", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/area"}
],
"blacklists": {}
}
}

View File

@ -0,0 +1,36 @@
{
"wf" : {
"threshold" : "0.9",
"dedupRun" : "001",
"entityType" : "organization",
"orderField" : "legalname",
"queueMaxSize" : "2000",
"groupMaxSize" : "10",
"slidingWindowSize" : "200",
"rootBuilder" : [ "organization", "projectOrganization_participation_isParticipant", "datasourceOrganization_provision_isProvidedBy" ],
"includeChildren" : "true"
},
"pace" : {
"clustering" : [
{ "name" : "sortedngrampairs", "fields" : [ "legalname" ], "params" : { "max" : 2, "ngramLen" : "3"} },
{ "name" : "suffixprefix", "fields" : [ "legalname" ], "params" : { "max" : 1, "len" : "3" } },
{ "name" : "urlclustering", "fields" : [ "websiteurl" ], "params" : { } }
],
"strictConditions" : [
{ "name" : "exactMatch", "fields" : [ "gridid" ] }
],
"conditions" : [
{ "name" : "exactMatch", "fields" : [ "country" ] },
{ "name" : "DomainExactMatch", "fields" : [ "websiteurl" ] }
],
"model" : [
{ "name" : "legalname", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value" },
{ "name" : "country", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/country/classid" },
{ "name" : "legalshortname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.1", "ignoreMissing" : "true", "path" : "organization/metadata/legalshortname/value" },
{ "name" : "legalname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.9", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value", "params" : {"windowSize" : 4, "threshold" : 0.5} },
{ "name" : "websiteurl", "algo" : "Null", "type" : "URL", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/websiteurl/value", "params" : { "host" : 0.5, "path" : 0.5 } },
{ "name" : "gridid", "algo" : "Null", "type" : "String", "weight" : "0.0", "ignoreMissing" : "true", "path" : "pid[qualifier#classid = {grid}]/value" }
],
"blacklists" : { }
}
}

View File

@ -20,10 +20,12 @@
{ "name" : "exactMatch", "fields" : [ "country" ] },
{ "name" : "DomainExactMatch", "fields" : [ "websiteurl" ] }
],
"decisionTree": {},
"model" : [
{ "name" : "legalname", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value" },
{ "name" : "country", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/country/classid" },
{ "name" : "legalshortname", "algo" : "JaroWinkler", "type" : "String", "weight" : "0.3", "ignoreMissing" : "true", "path" : "organization/metadata/legalshortname/value" },
{ "name" : "legalname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.7", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value", "length" : 5 },
{ "name" : "legalname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.7", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value", "params" : { "windowSize" : 4 } },
{ "name" : "websiteurl", "algo" : "Null", "type" : "URL", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/websiteurl/value", "params" : { "host" : 0.5, "path" : 0.5 } }
],
"blacklists" : { }

View File

@ -0,0 +1,24 @@
{"dateoftransformation": "2018-09-13", "originalId": ["opendoar____::Fonds_zur_F\u00f6rderung_der_wissenschaftlichen_Forschung_(Austrian_Science_Fund)"], "collectedfrom": [{"value": "OpenDOAR", "key": "10|openaire____::47ce9e9f4fad46e732cff06419ecaabb"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "FWF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.fwf.ac.at/"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Fonds zur F\u00f6rderung der wissenschaftlichen Forschung (Austrian Science Fund)"}, "country": {"classid": "AT", "classname": "Austria", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2015-08-24", "type": 20, "id": "20|opendoar____::77e7cd67c60d0c18aa835ea6ea58122c"}
{"dateoftransformation": "2018-12-15", "originalId": ["corda__h2020::998735960"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse - Horizon 2020", "key": "10|openaire____::a55eb91348674d853191f4f4fd73d078"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "FWF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.fwf.ac.at"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "FONDS ZUR F\u00d6RDERUNG DER WISSENSCHAFTLICHEN FORSCHUNG"}, "country": {"classid": "AT", "classname": "Austria", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-03-12", "type": 20, "id": "20|corda__h2020::83f579158b682262181b9a8ffdfa1124"}
{"dateoftransformation": "2018-11-20", "originalId": ["corda_______::998735960"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse", "key": "10|openaire____::b30dac7baac631f3da7c2bb18dd9891f"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "FWF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.fwf.ac.at"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "FONDS ZUR F\u00d6RDERUNG DER WISSENSCHAFTLICHEN FORSCHUNG"}, "country": {"classid": "AT", "classname": "Austria", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}}}, "dateofcollection": "2018-03-12", "type": 20, "id": "20|corda_______::83f579158b682262181b9a8ffdfa1124"}
{"dateoftransformation": "2018-09-27", "originalId": ["re3data_____::9f4430cdb5474d6db4bf84834533a7c9"], "collectedfrom": [{"value": "Registry of Research Data Repository", "key": "10|openaire____::21f8a223b9925c2f87c404096080b046"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "FWF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "https://www.fwf.ac.at/en/"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Fonds zur F\u00f6rderung der wissenschaftlichen Forschung"}, "country": {"classid": "AT", "classname": "Austria", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-09-27", "type": 20, "id": "20|re3data_____::a3ac0376cc2a582357d821cec70a3e5b"}
{"dateoftransformation": "2018-12-15", "originalId": ["corda__h2020::999861936"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse - Horizon 2020", "key": "10|openaire____::a55eb91348674d853191f4f4fd73d078"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "UNITO"}, "ecresearchorganization": {"value": "true"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.unito.it"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "UNIVERSITA DEGLI STUDI DI TORINO"}, "country": {"classid": "IT", "classname": "Italy", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "true"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-03-12", "type": 20, "id": "20|corda__h2020::ef77a7bbe5796b0b47aa60947a5c6f41"}
{"dateoftransformation": "2018-11-20", "originalId": ["corda_______::999861936"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse", "key": "10|openaire____::b30dac7baac631f3da7c2bb18dd9891f"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "UNITO"}, "ecresearchorganization": {"value": "true"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.unito.it"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "UNIVERSITA DEGLI STUDI DI TORINO"}, "country": {"classid": "IT", "classname": "Italy", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "true"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-03-12", "type": 20, "id": "20|corda_______::ef77a7bbe5796b0b47aa60947a5c6f41"}
{"dateoftransformation": "2018-09-13", "originalId": ["nih_________::UNIVERSITA_DI_TORINO"], "collectedfrom": [{"value": "NIH - National Institutes of Health", "key": "10|openaire____::9e9e8c76d739212c63eff362e321ba33"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecresearchorganization": {"value": "false"}, "ecenterprise": {"value": "false"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "UNIVERSITA DI TORINO"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-07-11", "type": 20, "id": "20|nih_________::fdd37fcef9df7c69ae7d620bf21ab272"}
{"dateoftransformation": "2018-09-19", "originalId": ["doajarticles::Universit\u00e0_degli_Studi_di_Torino"], "collectedfrom": [{"value": "DOAJ-Articles", "key": "10|driver______::bee53aa31dc2cbb538c10c2b65fa5824"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "Universit\u00e0 degli Studi di Torino"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Universit\u00e0 degli Studi di Torino"}, "country": {"classid": "IT", "classname": "Italy", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-09-19", "type": 20, "id": "20|doajarticles::f7ef827f8fe1d870b6464ef1affc9605"}
{"dateoftransformation": "2018-11-12", "originalId": ["opendoar____::Universit\u00e0_degli_Studi_di_Torino"], "collectedfrom": [{"value": "OpenDOAR", "key": "10|openaire____::47ce9e9f4fad46e732cff06419ecaabb"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecresearchorganization": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.unito.it/"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Universit\u00e0 degli Studi di Torino"}, "country": {"classid": "IT", "classname": "Italy", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-11-12", "type": 20, "id": "20|opendoar____::f7ef827f8fe1d870b6464ef1affc9605"}
{"collectedfrom": [{"value": "GRID - Global Research Identifier Database", "key": "10|openaire____::ff4a008470319a22d9cf3d14af485977"}], "organization": {"metadata": {"legalshortname": {"value": "RPF"}, "websiteurl": {"value": "http://www.research.org.cy/EN/index.html/"}, "country": {"classid": "CY", "classname": "Cyprus", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "legalname": {"value": "RPF"}}}, "pid": [{"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.14751.36"}], "id": "20|grid________::4f35352983a82950563eadfea49dc867", "type": 20}
{"collectedfrom": [{"value": "GRID - Global Research Identifier Database", "key": "10|openaire____::ff4a008470319a22d9cf3d14af485977"}], "organization": {"metadata": {"legalshortname": {"value": "RPF"}, "websiteurl": {"value": "http://www.research.org.cy/EN/index.html/"}, "country": {"classid": "CY", "classname": "Cyprus", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "legalname": {"value": "Research Promotion Foundation"}}}, "pid": [{"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.14751.36"}], "id": "20|grid________::a42b3c67ea94b54ee941fb42fefd51d6", "type": 20}
{"dateoftransformation": "2018-08-08", "originalId": ["corda__h2020::999946035"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse - Horizon 2020", "key": "10|openaire____::a55eb91348674d853191f4f4fd73d078"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "RPF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.research.org.cy"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "IDRYMA PROOTHISIS EREVNAS"}, "country": {"classid": "CY", "classname": "Cyprus", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-01-21", "type": 20, "id": "20|corda__h2020::a16918f80d830bf2b6daa5ec304f0e31"}
{"dateoftransformation": "2018-08-08", "originalId": ["corda_______::999946035"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse", "key": "10|openaire____::b30dac7baac631f3da7c2bb18dd9891f"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "RPF"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.research.org.cy"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "RESEARCH PROMOTION FOUNDATION"}, "country": {"classid": "CY", "classname": "Cyprus", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2015-09-10", "type": 20, "id": "20|corda_______::a16918f80d830bf2b6daa5ec304f0e31"}
{"collectedfrom": [{"value": "GRID - Global Research Identifier Database", "key": "10|openaire____::ff4a008470319a22d9cf3d14af485977"}], "organization": {"metadata": {"legalshortname": {"value": "DFG"}, "websiteurl": {"value": "http://www.dfg.de/en/"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "legalname": {"value": "Deutsche Forschungsgemeinschaft"}}}, "pid": [{"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.424150.6"}], "id": "20|grid________::7d83de934ecd5091d83334f752cef22c", "type": 20}
{"dateoftransformation": "2018-08-08", "originalId": ["corda_______::999547462"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse", "key": "10|openaire____::b30dac7baac631f3da7c2bb18dd9891f"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "ecnonprofit": {"value": "true"}, "websiteurl": {"value": "http://www.dfg.de"}, "ecnutscode": {"value": "false"}, "legalname": {"value": "DEUTSCHE FORSCHUNGSGEMEINSCHAFT"}}}, "dateofcollection": "2015-09-10", "type": 20, "id": "20|corda_______::3f41cfb7d56cfea69f3ce9792b822eb4"}
{"dateoftransformation": "2018-09-28", "originalId": ["dfgf________::DFG"], "collectedfrom": [{"value": "", "key": ""}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "DFG"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Deutsche Forschungsgemeinschaft"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-09-28", "type": 20, "id": "20|dfgf________::3bbe57698e353a2acaa03306316658bb"}
{"dateoftransformation": "2018-09-28", "originalId": ["dfgf________::DFGF"], "collectedfrom": [{"value": "", "key": ""}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "DFG"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Deutsche Forschungsgemeinschaft"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2018-09-28", "type": 20, "id": "20|dfgf________::14a2847759c496334d510ff8fafbd464"}
{"dateoftransformation": "2018-06-04", "originalId": ["re3data_____::bf9c8e5c69ff222e3ee2ff0fc4d2b289"], "collectedfrom": [{"value": "Registry of Research Data Repository", "key": "10|openaire____::21f8a223b9925c2f87c404096080b046"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "German Research Foundation"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.dfg.de/"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "Deutsche Forschungsgemeinschaft"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-01-07", "type": 20, "id": "20|re3data_____::fbb08ab5e8cf8cd1056f61b84ddf05dd"}
{"originalId": ["https://academic.microsoft.com/#/detail/87707601"], "pid": [{"qualifier": {"classid": "urn", "classname": "urn", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "http://en.wikipedia.org/wiki/Deutsche_Forschungsgemeinschaft"}, {"qualifier": {"classid": "mag_id", "classname": "Microsoft Academic Graph Identifier", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "https://academic.microsoft.com/#/detail/87707601"}, {"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.424150.6"}], "collectedfrom": [{"value": "Microsoft Academic Graph", "key": "10|openaire____::5f532a3fc4f1ea403f37070f59a7a53a"}], "organization": {"metadata": {"websiteurl": {"value": "http://www.dfg.de/"}, "legalname": {"value": "Deutsche Forschungsgemeinschaft"}}}, "type": 20, "id": "20|microsoft___::e2edddabcc31b692b4ca7b89456e750a"}
{"dateoftransformation": "2018-08-08", "originalId": ["corda__h2020::999547462"], "collectedfrom": [{"value": "CORDA - COmmon Research DAta Warehouse - Horizon 2020", "key": "10|openaire____::a55eb91348674d853191f4f4fd73d078"}], "organization": {"metadata": {"eclegalbody": {"value": "true"}, "eclegalperson": {"value": "true"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "DFG"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "true"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.dfg.de"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "DEUTSCHE FORSCHUNGSGEMEINSCHAFT"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-01-21", "type": 20, "id": "20|corda__h2020::3f41cfb7d56cfea69f3ce9792b822eb4"}
{"dateoftransformation": "2018-06-04", "originalId": ["re3data_____::64ef0759fcfccf84cca028ba3c21aa23"], "collectedfrom": [{"value": "Registry of Research Data Repository", "key": "10|openaire____::21f8a223b9925c2f87c404096080b046"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "Deutsche Forschungsgemeinschaft"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.dfg.de/en/index.jsp"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "German Research Foundation"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-01-07", "type": 20, "id": "20|re3data_____::e029b7e0de6cafc0c7126615c65458f0"}
{"dateoftransformation": "2018-06-04", "originalId": ["re3data_____::37e3bba353f88b4649d459c698483f6e"], "collectedfrom": [{"value": "Registry of Research Data Repository", "key": "10|openaire____::21f8a223b9925c2f87c404096080b046"}], "organization": {"metadata": {"eclegalbody": {"value": "false"}, "eclegalperson": {"value": "false"}, "ecinternationalorganization": {"value": "false"}, "legalshortname": {"value": "Deutsche Forschungsgemeinschaft"}, "ecresearchorganization": {"value": "false"}, "ecnonprofit": {"value": "false"}, "ecenterprise": {"value": "false"}, "websiteurl": {"value": "http://www.dfg.de/en/index.jsp"}, "ecnutscode": {"value": "false"}, "ecinternationalorganizationeurinterests": {"value": "false"}, "legalname": {"value": "German Research Association"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "echighereducation": {"value": "false"}, "ecsmevalidated": {"value": "false"}}}, "dateofcollection": "2016-01-07", "type": 20, "id": "20|re3data_____::2080dc170e6cd7c6c06f403f8a08c1be"}
{"collectedfrom": [{"value": "GRID - Global Research Identifier Database", "key": "10|openaire____::ff4a008470319a22d9cf3d14af485977"}], "organization": {"metadata": {"legalshortname": {"value": "DFG"}, "websiteurl": {"value": "http://www.dfg.de/en/"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "legalname": {"value": "DFG"}}}, "pid": [{"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.424150.6"}], "id": "20|grid________::085fd89ec6f3f92c354e0bc027de2a58", "type": 20}
{"collectedfrom": [{"value": "GRID - Global Research Identifier Database", "key": "10|openaire____::ff4a008470319a22d9cf3d14af485977"}], "organization": {"metadata": {"legalshortname": {"value": "DFG"}, "websiteurl": {"value": "http://www.dfg.de/en/"}, "country": {"classid": "DE", "classname": "Germany", "schemename": "dnet:countries", "schemeid": "dnet:countries"}, "legalname": {"value": "German Research Foundation"}}}, "pid": [{"qualifier": {"classid": "grid", "classname": "grid", "schemename": "dnet:pid_types", "schemeid": "dnet:pid_types"}, "value": "grid.424150.6"}], "id": "20|grid________::f0d88189673738d2a565aff99eeb59a2", "type": 20}

File diff suppressed because it is too large Load Diff

View File

@ -13,10 +13,7 @@ import eu.dnetlib.data.proto.ResultProtos.Result;
import eu.dnetlib.pace.config.Config;
import eu.dnetlib.pace.config.DedupConfig;
import eu.dnetlib.pace.config.Type;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.FieldValueImpl;
import eu.dnetlib.pace.model.MapDocument;
import eu.dnetlib.pace.model.ProtoDocumentBuilder;
import eu.dnetlib.pace.model.*;
import eu.dnetlib.pace.model.gt.GTAuthor;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.RandomStringUtils;
@ -25,9 +22,7 @@ import org.apache.commons.lang3.RandomUtils;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
@ -49,11 +44,13 @@ public abstract class AbstractProtoPaceTest extends OafTest {
return DedupConfig.load(readFromClasspath("/eu/dnetlib/pace/organization.pace.conf"));
}
protected DedupConfig getOrganizationTestConf() {
return DedupConfig.load(readFromClasspath("/eu/dnetlib/pace/organization.test.conf"));
}
protected DedupConfig getAuthorsTestConf() {
return DedupConfig.load(readFromClasspath("/eu/dnetlib/pace/authors.test.pace.conf"));
}
protected DedupConfig getResultAuthorsConf() {
return DedupConfig.load(readFromClasspath("/eu/dnetlib/pace/result.authors.pace.conf"));
@ -108,6 +105,29 @@ public abstract class AbstractProtoPaceTest extends OafTest {
return result(config, id, title, date, Lists.newArrayList(pid), authors);
}
protected MapDocument author(final String identifier, final String area, final String firstname, final String lastname, final String fullname, final Double[] topics, final String pubID, final String pubDOI, final int rank, final String orcid, final List<String> coauthors) {
Map<String, Field> fieldMap = new HashMap<>();
fieldMap.put("area", new FieldValueImpl(Type.String, "area", area));
fieldMap.put("firstname", new FieldValueImpl(Type.String, "firstname", firstname));
fieldMap.put("lastname", new FieldValueImpl(Type.String, "lastname", lastname));
fieldMap.put("fullname", new FieldValueImpl(Type.String, "fullname", fullname));
fieldMap.put("pubID", new FieldValueImpl(Type.String, "pubID", pubID));
fieldMap.put("pubDOI", new FieldValueImpl(Type.String, "pubDOI", pubDOI));
fieldMap.put("rank", new FieldValueImpl(Type.Int, "rank", rank));
fieldMap.put("orcid", new FieldValueImpl(Type.String, "orcid", orcid));
FieldListImpl ca = new FieldListImpl("coauthors", Type.String);
ca.addAll(coauthors.stream().map(s -> new FieldValueImpl(Type.String, "coauthors", s)).collect(Collectors.toList()));
fieldMap.put("coauthors", ca);
FieldListImpl t = new FieldListImpl("topics", Type.String);
t.addAll(Arrays.asList(topics).stream().map(d -> new FieldValueImpl(Type.String, "topics", d.toString())).collect(Collectors.toList()));
fieldMap.put("topics", t);
return new MapDocument(identifier, fieldMap);
}
static List<String> pidTypes = Lists.newArrayList();
static {
pidTypes.add("doi");

View File

@ -0,0 +1,79 @@
package eu.dnetlib.pace;
import org.apache.commons.io.IOUtils;
import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.OozieClientException;
import org.apache.oozie.client.WorkflowJob;
import org.junit.Test;
import java.io.*;
import java.util.Properties;
import static junit.framework.Assert.assertEquals;
public class DedupTestIT {
@Test
public void deduplicationTest() throws OozieClientException, InterruptedException {
//read properties to use in the oozie workflow
Properties prop = readProperties("/eu/dnetlib/test/properties/config.properties");
/*OOZIE WORKFLOW CREATION AND LAUNCH*/
// get a OozieClient for local Oozie
OozieClient wc = new OozieClient("http://hadoop-edge3.garr-pa1.d4science.org:11000/oozie");
// create a workflow job configuration and set the workflow application path
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, "hdfs://hadoop-rm1.garr-pa1.d4science.org:8020/user/michele.debonis/oozieJob/workflow.xml");
conf.setProperty(OozieClient.USER_NAME, "michele.debonis");
conf.setProperty("oozie.action.sharelib.for.spark", "spark2");
// setting workflow parameters
conf.setProperty("jobTracker", "hadoop-rm3.garr-pa1.d4science.org:8032");
conf.setProperty("nameNode", "hdfs://hadoop-rm1.garr-pa1.d4science.org:8020");
conf.setProperty("dedupConfiguration", prop.getProperty("dedup.configuration"));
conf.setProperty("inputSpace", prop.getProperty("input.space"));
// conf.setProperty("inputDir", "/usr/tucu/inputdir");
// conf.setProperty("outputDir", "/usr/tucu/outputdir");
// submit and start the workflow job
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
// wait until the workflow job finishes printing the status every 10 seconds
while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
System.out.println(wc.getJobInfo(jobId));;
Thread.sleep(10 * 1000);
}
// print the final status of the workflow job
System.out.println(wc.getJobInfo(jobId));
System.out.println("JOB LOG = " + wc.getJobLog(jobId));
assertEquals(WorkflowJob.Status.SUCCEEDED, wc.getJobInfo(jobId).getStatus());
}
static Properties readProperties(final String propFile) {
Properties prop = new Properties();
try {
prop.load(DedupTestIT.class.getResourceAsStream(propFile));
} catch (IOException e) {
e.printStackTrace();
}
return prop;
}
static String readFromClasspath(final String filename) {
final StringWriter sw = new StringWriter();
try {
IOUtils.copy(DedupTestIT.class.getResourceAsStream(filename), sw);
return sw.toString();
} catch (final IOException e) {
throw new RuntimeException("cannot load resource from classpath: " + filename);
}
}
}

View File

@ -0,0 +1,17 @@
package eu.dnetlib.pace;
import eu.dnetlib.SparkLocalTest;
import org.junit.Test;
import java.io.IOException;
public class SparkTester {
@Test
public void sparkLocalTest() throws IOException {
SparkLocalTest.main(new String[]{});
}
}

View File

@ -0,0 +1,40 @@
{
"wf" : {
"threshold" : "0.99",
"dedupRun" : "001",
"entityType" : "person",
"orderField" : "fullname",
"queueMaxSize" : "2000",
"groupMaxSize" : "10",
"slidingWindowSize" : "200",
"rootBuilder" : [ "person" ],
"includeChildren" : "true"
},
"pace": {
"clustering": [
{"name": "personClustering", "fields": ["fullname"], "params": {}}
],
"conditions": [],
"decisionTree": {
"start": {"fields": [{"field":"pubID", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"NO_MATCH", "negative":"layer2", "undefined": "layer2", "ignoreMissing": "false"},
"layer2": {"fields": [{"field":"orcid", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"ORCID_MATCH", "negative":"NO_MATCH", "undefined": "layer3", "ignoreMissing": "false"},
"layer3": {"fields": [{"field":"firstname", "comparator":"similar", "weight":1.0, "params":{}}], "threshold":0.7, "aggregation": "SUM", "positive":"layer4", "negative":"NO_MATCH", "undefined": "layer4", "ignoreMissing": "false"},
"layer4": {"fields": [{"field":"coauthors", "comparator":"coauthorsMatch", "weight":1.0, "params":{"minCoauthors":6, "maxCoauthors": 200}}], "threshold":5.0, "aggregation": "SUM", "positive":"COAUTHORS_MATCH", "negative":"NO_MATCH", "undefined": "layer5", "ignoreMissing": "false"},
"layer5": {"fields": [{"field":"area", "comparator":"exactMatch", "weight":1.0, "params":{}}], "threshold":1.0, "aggregation": "SUM", "positive":"layer6", "negative":"NO_MATCH", "undefined": "NO_MATCH", "ignoreMissing": "false"},
"layer6": {"fields": [{"field":"topics", "comparator":"topicsMatch", "weight":1.0, "params":{}}], "threshold":0.7, "aggregation": "SUM", "positive":"TOPICS_MATCH", "negative":"NO_MATCH", "undefined": "NO_MATCH", "ignoreMissing": "false"}
},
"model": [
{"name": "fullname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/fullname"},
{"name": "firstname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/firstname"},
{"name": "lastname", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/lastname"},
{"name": "coauthors", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/coauthors"},
{"name": "orcid", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/orcid"},
{"name": "topics", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/topics"},
{"name": "pubID", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/pubID"},
{"name": "pubDOI", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/pubDOI"},
{"name": "rank", "algo": "Null", "type": "Int", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/rank"},
{"name": "area", "algo": "Null", "type": "String", "weight": "0", "ignoreMissing": "false", "path": "person/metadata/area"}
],
"blacklists": {}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,2 @@
input.space = /eu/dnetlib/pace/organization.to.fix.json
dedup.configuration = /eu/dnetlib/pace/org.curr.conf

109
dnet-dedup.ipr Normal file
View File

@ -0,0 +1,109 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project version="4" relativePaths="false">
<component name="ProjectRootManager" version="2" assert-keyword="true" project-jdk-name="1.8" jdk-15="true"/>
<component name="CodeStyleManager">
<option name="USE_DEFAULT_CODE_STYLE_SCHEME" value="true"/>
<option name="CODE_STYLE_SCHEME" value=""/>
</component>
<component name="libraryTable"/>
<component name="CompilerConfiguration">
<option name="DEFAULT_COMPILER" value="Javac"/>
<option name="CLEAR_OUTPUT_DIRECTORY" value="false"/>
<!--
<wildcardResourcePatterns>
<entry name="${wildcardResourcePattern}"/>
</wildcardResourcePatterns>
-->
<wildcardResourcePatterns>
<entry name="!?*.java"/>
</wildcardResourcePatterns>
</component>
<component name="JavacSettings">
<option name="DEBUGGING_INFO" value="true"/>
<option name="GENERATE_NO_WARNINGS" value="false"/>
<option name="DEPRECATION" value="true"/>
<option name="ADDITIONAL_OPTIONS_STRING" value=""/>
<option name="MAXIMUM_HEAP_SIZE" value="128"/>
<option name="USE_GENERICS_COMPILER" value="false"/>
</component>
<component name="JikesSettings">
<option name="DEBUGGING_INFO" value="true"/>
<option name="DEPRECATION" value="true"/>
<option name="GENERATE_NO_WARNINGS" value="false"/>
<option name="GENERATE_MAKE_FILE_DEPENDENCIES" value="false"/>
<option name="DO_FULL_DEPENDENCE_CHECK" value="false"/>
<option name="IS_INCREMENTAL_MODE" value="false"/>
<option name="IS_EMACS_ERRORS_MODE" value="true"/>
<option name="ADDITIONAL_OPTIONS_STRING" value=""/>
<option name="MAXIMUM_HEAP_SIZE" value="128"/>
</component>
<component name="AntConfiguration">
<option name="IS_AUTOSCROLL_TO_SOURCE" value="false"/>
<option name="FILTER_TARGETS" value="false"/>
</component>
<component name="JavadocGenerationManager">
<option name="OUTPUT_DIRECTORY"/>
<option name="OPTION_SCOPE" value="protected"/>
<option name="OPTION_HIERARCHY" value="false"/>
<option name="OPTION_NAVIGATOR" value="false"/>
<option name="OPTION_INDEX" value="false"/>
<option name="OPTION_SEPARATE_INDEX" value="false"/>
<option name="OPTION_USE_1_1" value="false"/>
<option name="OPTION_DOCUMENT_TAG_USE" value="false"/>
<option name="OPTION_DOCUMENT_TAG_AUTHOR" value="false"/>
<option name="OPTION_DOCUMENT_TAG_VERSION" value="false"/>
<option name="OPTION_DOCUMENT_TAG_DEPRECATED" value="false"/>
<option name="OPTION_DEPRECATED_LIST" value="false"/>
<option name="OTHER_OPTIONS"/>
<option name="HEAP_SIZE"/>
<option name="OPEN_IN_BROWSER" value="false"/>
</component>
<component name="JUnitProjectSettings">
<option name="TEST_RUNNER" value="UI"/>
</component>
<component name="EntryPointsManager">
<entry_points/>
</component>
<component name="DataSourceManager"/>
<component name="ExportToHTMLSettings">
<option name="PRINT_LINE_NUMBERS" value="false"/>
<option name="OPEN_IN_BROWSER" value="false"/>
<option name="OUTPUT_DIRECTORY"/>
</component>
<component name="ImportConfiguration">
<option name="VENDOR"/>
<option name="RELEASE_TAG"/>
<option name="LOG_MESSAGE"/>
<option name="CHECKOUT_AFTER_IMPORT" value="true"/>
</component>
<component name="ProjectModuleManager">
<modules>
<!-- module filepath="$$PROJECT_DIR$$/${pom.artifactId}.iml"/ -->
<module filepath="$PROJECT_DIR$/dnet-dedup.iml"/>
<module filepath="$PROJECT_DIR$/dnet-pace-core/dnet-pace-core.iml"/>
<module filepath="$PROJECT_DIR$/dnet-dedup-test/dnet-dedup-test.iml"/>
</modules>
</component>
<UsedPathMacros>
<!--<macro name="cargo"></macro>-->
</UsedPathMacros>
</project>

418
dnet-dedup.iws Normal file
View File

@ -0,0 +1,418 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project version="4" relativePaths="false">
<component name="LvcsProjectConfiguration">
<option name="ADD_LABEL_ON_PROJECT_OPEN" value="true"/>
<option name="ADD_LABEL_ON_PROJECT_COMPILATION" value="true"/>
<option name="ADD_LABEL_ON_FILE_PACKAGE_COMPILATION" value="true"/>
<option name="ADD_LABEL_ON_PROJECT_MAKE" value="true"/>
<option name="ADD_LABEL_ON_RUNNING" value="true"/>
<option name="ADD_LABEL_ON_DEBUGGING" value="true"/>
<option name="ADD_LABEL_ON_UNIT_TEST_PASSED" value="true"/>
<option name="ADD_LABEL_ON_UNIT_TEST_FAILED" value="true"/>
</component>
<component name="PropertiesComponent">
<property name="MemberChooser.copyJavadoc" value="false"/>
<property name="GoToClass.includeLibraries" value="false"/>
<property name="MemberChooser.showClasses" value="true"/>
<property name="MemberChooser.sorted" value="false"/>
<property name="GoToFile.includeJavaFiles" value="false"/>
<property name="GoToClass.toSaveIncludeLibraries" value="false"/>
</component>
<component name="ToolWindowManager">
<frame x="-4" y="-4" width="1032" height="746" extended-state="6"/>
<editor active="false"/>
<layout>
<window_info id="CVS" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="-1"/>
<window_info id="TODO" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="7"/>
<window_info id="Project" active="false" anchor="left" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="0"/>
<window_info id="Find" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="1"/>
<window_info id="Structure" active="false" anchor="left" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="1"/>
<window_info id="Messages" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="-1"/>
<window_info id="Inspection" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.4" order="6"/>
<window_info id="Aspects" active="false" anchor="right" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="-1"/>
<window_info id="Ant Build" active="false" anchor="right" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="1"/>
<window_info id="Run" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="2"/>
<window_info id="Hierarchy" active="false" anchor="right" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="2"/>
<window_info id="Debug" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.4" order="4"/>
<window_info id="Commander" active="false" anchor="right" auto_hide="false" internal_type="sliding" type="sliding" visible="false" weight="0.4" order="0"/>
<window_info id="Web" active="false" anchor="left" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="2"/>
<window_info id="Message" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.33" order="0"/>
<window_info id="EJB" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="3"/>
<window_info id="Cvs" active="false" anchor="bottom" auto_hide="false" internal_type="docked" type="docked" visible="false" weight="0.25" order="5"/>
</layout>
</component>
<component name="ErrorTreeViewConfiguration">
<option name="IS_AUTOSCROLL_TO_SOURCE" value="false"/>
<option name="HIDE_WARNINGS" value="false"/>
</component>
<component name="StructureViewFactory">
<option name="SORT_MODE" value="0"/>
<option name="GROUP_INHERITED" value="true"/>
<option name="AUTOSCROLL_MODE" value="true"/>
<option name="SHOW_FIELDS" value="true"/>
<option name="AUTOSCROLL_FROM_SOURCE" value="false"/>
<option name="GROUP_GETTERS_AND_SETTERS" value="true"/>
<option name="SHOW_INHERITED" value="false"/>
<option name="HIDE_NOT_PUBLIC" value="false"/>
</component>
<component name="ProjectViewSettings">
<navigator currentView="ProjectPane" flattenPackages="false" showMembers="false" showStructure="false" autoscrollToSource="false" splitterProportion="0.5"/>
<view id="ProjectPane">
<expanded_node type="directory" url="file://$PROJECT_DIR$"/>
</view>
<view id="SourcepathPane"/>
<view id="ClasspathPane"/>
</component>
<component name="Commander">
<leftPanel view="Project"/>
<rightPanel view="Project"/>
<splitter proportion="0.5"/>
</component>
<component name="AspectsView"/>
<component name="SelectInManager"/>
<component name="HierarchyBrowserManager">
<option name="SHOW_PACKAGES" value="false"/>
<option name="IS_AUTOSCROLL_TO_SOURCE" value="false"/>
<option name="SORT_ALPHABETICALLY" value="false"/>
</component>
<component name="TodoView" selected-index="0">
<todo-panel id="selected-file">
<are-packages-shown value="false"/>
<flatten-packages value="false"/>
<is-autoscroll-to-source value="true"/>
</todo-panel>
<todo-panel id="all">
<are-packages-shown value="true"/>
<flatten-packages value="false"/>
<is-autoscroll-to-source value="true"/>
</todo-panel>
</component>
<component name="editorManager"/>
<component name="editorHistoryManager"/>
<component name="DaemonCodeAnalyzer">
<disable_hints/>
</component>
<component name="InspectionManager">
<option name="AUTOSCROLL_TO_SOURCE" value="false"/>
<option name="SPLITTER_PROPORTION" value="0.5"/>
<profile name="Default"/>
</component>
<component name="BookmarkManager"/>
<component name="DebuggerManager">
<line_breakpoints/>
<exception_breakpoints>
<breakpoint_any>
<option name="NOTIFY_CAUGHT" value="true"/>
<option name="NOTIFY_UNCAUGHT" value="true"/>
<option name="ENABLED" value="false"/>
<option name="SUSPEND_VM" value="true"/>
<option name="COUNT_FILTER_ENABLED" value="false"/>
<option name="COUNT_FILTER" value="0"/>
<option name="CONDITION_ENABLED" value="false"/>
<option name="CONDITION"/>
<option name="LOG_ENABLED" value="false"/>
<option name="LOG_EXPRESSION_ENABLED" value="false"/>
<option name="LOG_MESSAGE"/>
<option name="CLASS_FILTERS_ENABLED" value="false"/>
<option name="INVERSE_CLASS_FILLTERS" value="false"/>
<option name="SUSPEND_POLICY" value="SuspendAll"/>
</breakpoint_any>
</exception_breakpoints>
<field_breakpoints/>
<method_breakpoints/>
</component>
<component name="DebuggerSettings">
<option name="TRACING_FILTERS_ENABLED" value="true"/>
<option name="TOSTRING_CLASSES_ENABLED" value="false"/>
<option name="VALUE_LOOKUP_DELAY" value="700"/>
<option name="DEBUGGER_TRANSPORT" value="0"/>
<option name="FORCE_CLASSIC_VM" value="true"/>
<option name="HIDE_DEBUGGER_ON_PROCESS_TERMINATION" value="false"/>
<option name="SKIP_SYNTHETIC_METHODS" value="true"/>
<option name="SKIP_CONSTRUCTORS" value="false"/>
<option name="STEP_THREAD_SUSPEND_POLICY" value="SuspendThread"/>
<default_breakpoint_settings>
<option name="NOTIFY_CAUGHT" value="true"/>
<option name="NOTIFY_UNCAUGHT" value="true"/>
<option name="WATCH_MODIFICATION" value="true"/>
<option name="WATCH_ACCESS" value="true"/>
<option name="WATCH_ENTRY" value="true"/>
<option name="WATCH_EXIT" value="true"/>
<option name="ENABLED" value="true"/>
<option name="SUSPEND_VM" value="true"/>
<option name="COUNT_FILTER_ENABLED" value="false"/>
<option name="COUNT_FILTER" value="0"/>
<option name="CONDITION_ENABLED" value="false"/>
<option name="CONDITION"/>
<option name="LOG_ENABLED" value="false"/>
<option name="LOG_EXPRESSION_ENABLED" value="false"/>
<option name="LOG_MESSAGE"/>
<option name="CLASS_FILTERS_ENABLED" value="false"/>
<option name="INVERSE_CLASS_FILLTERS" value="false"/>
<option name="SUSPEND_POLICY" value="SuspendAll"/>
</default_breakpoint_settings>
<filter>
<option name="PATTERN" value="com.sun.*"/>
<option name="ENABLED" value="true"/>
</filter>
<filter>
<option name="PATTERN" value="java.*"/>
<option name="ENABLED" value="true"/>
</filter>
<filter>
<option name="PATTERN" value="javax.*"/>
<option name="ENABLED" value="true"/>
</filter>
<filter>
<option name="PATTERN" value="org.omg.*"/>
<option name="ENABLED" value="true"/>
</filter>
<filter>
<option name="PATTERN" value="sun.*"/>
<option name="ENABLED" value="true"/>
</filter>
<filter>
<option name="PATTERN" value="junit.*"/>
<option name="ENABLED" value="true"/>
</filter>
</component>
<component name="CompilerWorkspaceConfiguration">
<option name="COMPILE_IN_BACKGROUND" value="false"/>
<option name="AUTO_SHOW_ERRORS_IN_EDITOR" value="true"/>
</component>
<component name="RunManager">
<activeType name="Application"/>
<configuration selected="false" default="true" type="Applet" factoryName="Applet">
<module name=""/>
<option name="MAIN_CLASS_NAME"/>
<option name="HTML_FILE_NAME"/>
<option name="HTML_USED" value="false"/>
<option name="WIDTH" value="400"/>
<option name="HEIGHT" value="300"/>
<option name="POLICY_FILE" value="$APPLICATION_HOME_DIR$/bin/appletviewer.policy"/>
<option name="VM_PARAMETERS"/>
</configuration>
<configuration selected="false" default="true" type="Remote" factoryName="Remote">
<option name="USE_SOCKET_TRANSPORT" value="true"/>
<option name="SERVER_MODE" value="false"/>
<option name="SHMEM_ADDRESS" value="javadebug"/>
<option name="HOST" value="localhost"/>
<option name="PORT" value="5005"/>
</configuration>
<configuration selected="false" default="true" type="Application" factoryName="Application">
<option name="MAIN_CLASS_NAME"/>
<option name="VM_PARAMETERS"/>
<option name="PROGRAM_PARAMETERS"/>
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$"/>
<module name=""/>
</configuration>
<configuration selected="false" default="true" type="JUnit" factoryName="JUnit">
<module name=""/>
<option name="PACKAGE_NAME"/>
<option name="MAIN_CLASS_NAME"/>
<option name="METHOD_NAME"/>
<option name="TEST_OBJECT" value="class"/>
<option name="VM_PARAMETERS"/>
<option name="PARAMETERS"/>
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$"/>
<option name="ADDITIONAL_CLASS_PATH"/>
<option name="TEST_SEARCH_SCOPE">
<value defaultName="wholeProject"/>
</option>
</configuration>
</component>
<component name="VcsManagerConfiguration">
<option name="ACTIVE_VCS_NAME" value="git"/>
<option name="STATE" value="0"/>
</component>
<component name="VssConfiguration">
<CheckoutOptions>
<option name="COMMENT" value=""/>
<option name="DO_NOT_GET_LATEST_VERSION" value="false"/>
<option name="REPLACE_WRITABLE" value="false"/>
<option name="RECURSIVE" value="false"/>
</CheckoutOptions>
<CheckinOptions>
<option name="COMMENT" value=""/>
<option name="KEEP_CHECKED_OUT" value="false"/>
<option name="RECURSIVE" value="false"/>
</CheckinOptions>
<AddOptions>
<option name="COMMENT" value=""/>
<option name="STORE_ONLY_LATEST_VERSION" value="false"/>
<option name="CHECK_OUT_IMMEDIATELY" value="false"/>
<option name="FILE_TYPE" value="0"/>
</AddOptions>
<UndocheckoutOptions>
<option name="MAKE_WRITABLE" value="false"/>
<option name="REPLACE_LOCAL_COPY" value="0"/>
<option name="RECURSIVE" value="false"/>
</UndocheckoutOptions>
<DiffOptions>
<option name="IGNORE_WHITE_SPACE" value="false"/>
<option name="IGNORE_CASE" value="false"/>
</DiffOptions>
<GetOptions>
<option name="REPLACE_WRITABLE" value="0"/>
<option name="MAKE_WRITABLE" value="false"/>
<option name="RECURSIVE" value="false"/>
</GetOptions>
<option name="CLIENT_PATH" value=""/>
<option name="SRCSAFEINI_PATH" value=""/>
<option name="USER_NAME" value=""/>
<option name="PWD" value=""/>
<option name="SHOW_CHECKOUT_OPTIONS" value="true"/>
<option name="SHOW_ADD_OPTIONS" value="true"/>
<option name="SHOW_UNDOCHECKOUT_OPTIONS" value="true"/>
<option name="SHOW_DIFF_OPTIONS" value="true"/>
<option name="SHOW_GET_OPTIONS" value="true"/>
<option name="USE_EXTERNAL_DIFF" value="false"/>
<option name="EXTERNAL_DIFF_PATH" value=""/>
<option name="REUSE_LAST_COMMENT" value="false"/>
<option name="PUT_FOCUS_INTO_COMMENT" value="false"/>
<option name="SHOW_CHECKIN_OPTIONS" value="true"/>
<option name="LAST_COMMIT_MESSAGE" value=""/>
<option name="CHECKIN_DIALOG_SPLITTER_PROPORTION" value="0.8"/>
</component>
<component name="CheckinPanelState"/>
<component name="WebViewSettings">
<webview flattenPackages="false" showMembers="false" autoscrollToSource="false"/>
</component>
<component name="EjbViewSettings">
<EjbView showMembers="false" autoscrollToSource="false"/>
</component>
<component name="AppServerRunManager"/>
<component name="StarteamConfiguration">
<option name="SERVER" value=""/>
<option name="PORT" value="49201"/>
<option name="USER" value=""/>
<option name="PASSWORD" value=""/>
<option name="PROJECT" value=""/>
<option name="VIEW" value=""/>
<option name="ALTERNATIVE_WORKING_PATH" value=""/>
<option name="PUT_FOCUS_INTO_COMMENT" value="false"/>
<option name="SHOW_CHECKIN_OPTIONS" value="true"/>
<option name="LAST_COMMIT_MESSAGE" value=""/>
<option name="CHECKIN_DIALOG_SPLITTER_PROPORTION" value="0.8"/>
</component>
<component name="Cvs2Configuration">
<option name="ON_FILE_ADDING" value="0"/>
<option name="ON_FILE_REMOVING" value="0"/>
<option name="PRUNE_EMPTY_DIRECTORIES" value="true"/>
<option name="SHOW_UPDATE_OPTIONS" value="true"/>
<option name="SHOW_ADD_OPTIONS" value="true"/>
<option name="SHOW_REMOVE_OPTIONS" value="true"/>
<option name="MERGING_MODE" value="0"/>
<option name="MERGE_WITH_BRANCH1_NAME" value="HEAD"/>
<option name="MERGE_WITH_BRANCH2_NAME" value="HEAD"/>
<option name="RESET_STICKY" value="false"/>
<option name="CREATE_NEW_DIRECTORIES" value="true"/>
<option name="DEFAULT_TEXT_FILE_SUBSTITUTION" value="kv"/>
<option name="PROCESS_UNKNOWN_FILES" value="false"/>
<option name="PROCESS_DELETED_FILES" value="false"/>
<option name="SHOW_EDIT_DIALOG" value="true"/>
<option name="RESERVED_EDIT" value="false"/>
<option name="FILE_HISTORY_SPLITTER_PROPORTION" value="0.6"/>
<option name="SHOW_CHECKOUT_OPTIONS" value="true"/>
<option name="CHECKOUT_DATE_OR_REVISION_SETTINGS">
<value>
<option name="BRANCH" value=""/>
<option name="DATE" value=""/>
<option name="USE_BRANCH" value="false"/>
<option name="USE_DATE" value="false"/>
</value>
</option>
<option name="UPDATE_DATE_OR_REVISION_SETTINGS">
<value>
<option name="BRANCH" value=""/>
<option name="DATE" value=""/>
<option name="USE_BRANCH" value="false"/>
<option name="USE_DATE" value="false"/>
</value>
</option>
<option name="SHOW_CHANGES_REVISION_SETTINGS">
<value>
<option name="BRANCH" value=""/>
<option name="DATE" value=""/>
<option name="USE_BRANCH" value="false"/>
<option name="USE_DATE" value="false"/>
</value>
</option>
<option name="SHOW_OUTPUT" value="false"/>
<option name="SHOW_FILE_HISTORY_AS_TREE" value="false"/>
<option name="UPDATE_GROUP_BY_PACKAGES" value="false"/>
<option name="ADD_WATCH_INDEX" value="0"/>
<option name="REMOVE_WATCH_INDEX" value="0"/>
<option name="UPDATE_KEYWORD_SUBSTITUTION"/>
<option name="MAKE_NEW_FILES_READONLY" value="false"/>
<option name="SHOW_CORRUPTED_PROJECT_FILES" value="0"/>
<option name="TAG_AFTER_FILE_COMMIT" value="false"/>
<option name="TAG_AFTER_FILE_COMMIT_NAME" value=""/>
<option name="TAG_AFTER_PROJECT_COMMIT" value="false"/>
<option name="TAG_AFTER_PROJECT_COMMIT_NAME" value=""/>
<option name="PUT_FOCUS_INTO_COMMENT" value="false"/>
<option name="SHOW_CHECKIN_OPTIONS" value="true"/>
<option name="FORCE_NON_EMPTY_COMMENT" value="false"/>
<option name="LAST_COMMIT_MESSAGE" value=""/>
<option name="SAVE_LAST_COMMIT_MESSAGE" value="true"/>
<option name="CHECKIN_DIALOG_SPLITTER_PROPORTION" value="0.8"/>
<option name="OPTIMIZE_IMPORTS_BEFORE_PROJECT_COMMIT" value="false"/>
<option name="OPTIMIZE_IMPORTS_BEFORE_FILE_COMMIT" value="false"/>
<option name="REFORMAT_BEFORE_PROJECT_COMMIT" value="false"/>
<option name="REFORMAT_BEFORE_FILE_COMMIT" value="false"/>
<option name="FILE_HISTORY_DIALOG_COMMENTS_SPLITTER_PROPORTION" value="0.8"/>
<option name="FILE_HISTORY_DIALOG_SPLITTER_PROPORTION" value="0.5"/>
</component>
<component name="CvsTabbedWindow"/>
<component name="SvnConfiguration">
<option name="USER" value=""/>
<option name="PASSWORD" value=""/>
<option name="AUTO_ADD_FILES" value="0"/>
<option name="AUTO_DEL_FILES" value="0"/>
</component>
<component name="PerforceConfiguration">
<option name="PORT" value="magic:1666"/>
<option name="USER" value=""/>
<option name="PASSWORD" value=""/>
<option name="CLIENT" value=""/>
<option name="TRACE" value="false"/>
<option name="PERFORCE_STATUS" value="true"/>
<option name="CHANGELIST_OPTION" value="false"/>
<option name="SYSTEMROOT" value=""/>
<option name="P4_EXECUTABLE" value="p4"/>
<option name="SHOW_BRANCH_HISTORY" value="false"/>
<option name="GENERATE_COMMENT" value="false"/>
<option name="SYNC_OPTION" value="Sync"/>
<option name="PUT_FOCUS_INTO_COMMENT" value="false"/>
<option name="SHOW_CHECKIN_OPTIONS" value="true"/>
<option name="FORCE_NON_EMPTY_COMMENT" value="true"/>
<option name="LAST_COMMIT_MESSAGE" value=""/>
<option name="SAVE_LAST_COMMIT_MESSAGE" value="true"/>
<option name="CHECKIN_DIALOG_SPLITTER_PROPORTION" value="0.8"/>
<option name="OPTIMIZE_IMPORTS_BEFORE_PROJECT_COMMIT" value="false"/>
<option name="OPTIMIZE_IMPORTS_BEFORE_FILE_COMMIT" value="false"/>
<option name="REFORMAT_BEFORE_PROJECT_COMMIT" value="false"/>
<option name="REFORMAT_BEFORE_FILE_COMMIT" value="false"/>
<option name="FILE_HISTORY_DIALOG_COMMENTS_SPLITTER_PROPORTION" value="0.8"/>
<option name="FILE_HISTORY_DIALOG_SPLITTER_PROPORTION" value="0.5"/>
</component>
</project>

View File

@ -0,0 +1,2 @@
{"type_source": "SVN", "goal": "package -U source:jar",
"url": "http://svn-public.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-openaire-data-protos/trunk/", "deploy_repository": "dnet45-snapshots", "version": "4", "mail": "sandro.labruzzo@isti.cnr.it,michele.artini@isti.cnr.it, claudio.atzori@isti.cnr.it, alessia.bardi@isti.cnr.it", "deploy_repository_url": "http://maven.research-infrastructures.eu/nexus/content/repositories/dnet45-snapshots", "name": "dnet-openaire-data-protos"}

View File

@ -0,0 +1,58 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>eu.dnetlib</groupId>
<artifactId>dnet45-parent</artifactId>
<version>1.0.0</version>
<relativePath />
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>eu.dnetlib</groupId>
<artifactId>dnet-openaire-data-protos</artifactId>
<packaging>jar</packaging>
<version>3.9.4-proto250</version>
<properties>
<!-- defined also in dnet-parent, here in case we need to override -->
<google.protobuf.version>2.4.1</google.protobuf.version>
</properties>
<pluginRepositories>
<pluginRepository>
<id>dnet4-bootstrap-release</id>
<url>http://maven.research-infrastructures.eu/nexus/content/repositories/dnet4-bootstrap-release/</url>
</pluginRepository>
</pluginRepositories>
<build>
<plugins>
<plugin>
<groupId>eu.dnetlib</groupId>
<artifactId>protoc-jar-maven-plugin</artifactId>
<version>1.1.0</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<protocVersion>${google.protobuf.version}</protocVersion>
<inputDirectories>
<include>src/main/resources</include>
</inputDirectories>
<outputDirectory>src/gen/java</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${google.protobuf.version}</version>
</dependency>
</dependencies>
</project>

View File

@ -0,0 +1,564 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: Dedup.proto
package eu.dnetlib.data.proto;
public final class DedupProtos {
private DedupProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public interface DedupOrBuilder
extends com.google.protobuf.MessageOrBuilder {
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
boolean hasRelMetadata();
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata();
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder();
}
public static final class Dedup extends
com.google.protobuf.GeneratedMessage
implements DedupOrBuilder {
// Use Dedup.newBuilder() to construct.
private Dedup(Builder builder) {
super(builder);
}
private Dedup(boolean noInit) {}
private static final Dedup defaultInstance;
public static Dedup getDefaultInstance() {
return defaultInstance;
}
public Dedup getDefaultInstanceForType() {
return defaultInstance;
}
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupProtos.internal_static_eu_dnetlib_data_proto_Dedup_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.DedupProtos.internal_static_eu_dnetlib_data_proto_Dedup_fieldAccessorTable;
}
public enum RelName
implements com.google.protobuf.ProtocolMessageEnum {
isMergedIn(0, 1),
merges(1, 2),
;
public static final int isMergedIn_VALUE = 1;
public static final int merges_VALUE = 2;
public final int getNumber() { return value; }
public static RelName valueOf(int value) {
switch (value) {
case 1: return isMergedIn;
case 2: return merges;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<RelName>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<RelName>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<RelName>() {
public RelName findValueByNumber(int number) {
return RelName.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupProtos.Dedup.getDescriptor().getEnumTypes().get(0);
}
private static final RelName[] VALUES = {
isMergedIn, merges,
};
public static RelName valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private RelName(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.Dedup.RelName)
}
private int bitField0_;
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
public static final int RELMETADATA_FIELD_NUMBER = 1;
private eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata relMetadata_;
public boolean hasRelMetadata() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata() {
return relMetadata_;
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder() {
return relMetadata_;
}
private void initFields() {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
}
private byte memoizedIsInitialized = -1;
public final boolean isInitialized() {
byte isInitialized = memoizedIsInitialized;
if (isInitialized != -1) return isInitialized == 1;
if (!hasRelMetadata()) {
memoizedIsInitialized = 0;
return false;
}
if (!getRelMetadata().isInitialized()) {
memoizedIsInitialized = 0;
return false;
}
memoizedIsInitialized = 1;
return true;
}
public void writeTo(com.google.protobuf.CodedOutputStream output)
throws java.io.IOException {
getSerializedSize();
if (((bitField0_ & 0x00000001) == 0x00000001)) {
output.writeMessage(1, relMetadata_);
}
getUnknownFields().writeTo(output);
}
private int memoizedSerializedSize = -1;
public int getSerializedSize() {
int size = memoizedSerializedSize;
if (size != -1) return size;
size = 0;
if (((bitField0_ & 0x00000001) == 0x00000001)) {
size += com.google.protobuf.CodedOutputStream
.computeMessageSize(1, relMetadata_);
}
size += getUnknownFields().getSerializedSize();
memoizedSerializedSize = size;
return size;
}
private static final long serialVersionUID = 0L;
@java.lang.Override
protected java.lang.Object writeReplace()
throws java.io.ObjectStreamException {
return super.writeReplace();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
com.google.protobuf.ByteString data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
com.google.protobuf.ByteString data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
byte[] data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(java.io.InputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseDelimitedFrom(java.io.InputStream input)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseDelimitedFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input, extensionRegistry)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
com.google.protobuf.CodedInputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.DedupProtos.Dedup parseFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static Builder newBuilder() { return Builder.create(); }
public Builder newBuilderForType() { return newBuilder(); }
public static Builder newBuilder(eu.dnetlib.data.proto.DedupProtos.Dedup prototype) {
return newBuilder().mergeFrom(prototype);
}
public Builder toBuilder() { return newBuilder(this); }
@java.lang.Override
protected Builder newBuilderForType(
com.google.protobuf.GeneratedMessage.BuilderParent parent) {
Builder builder = new Builder(parent);
return builder;
}
public static final class Builder extends
com.google.protobuf.GeneratedMessage.Builder<Builder>
implements eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder {
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupProtos.internal_static_eu_dnetlib_data_proto_Dedup_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.DedupProtos.internal_static_eu_dnetlib_data_proto_Dedup_fieldAccessorTable;
}
// Construct using eu.dnetlib.data.proto.DedupProtos.Dedup.newBuilder()
private Builder() {
maybeForceBuilderInitialization();
}
private Builder(BuilderParent parent) {
super(parent);
maybeForceBuilderInitialization();
}
private void maybeForceBuilderInitialization() {
if (com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders) {
getRelMetadataFieldBuilder();
}
}
private static Builder create() {
return new Builder();
}
public Builder clear() {
super.clear();
if (relMetadataBuilder_ == null) {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
} else {
relMetadataBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public Builder clone() {
return create().mergeFrom(buildPartial());
}
public com.google.protobuf.Descriptors.Descriptor
getDescriptorForType() {
return eu.dnetlib.data.proto.DedupProtos.Dedup.getDescriptor();
}
public eu.dnetlib.data.proto.DedupProtos.Dedup getDefaultInstanceForType() {
return eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance();
}
public eu.dnetlib.data.proto.DedupProtos.Dedup build() {
eu.dnetlib.data.proto.DedupProtos.Dedup result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(result);
}
return result;
}
private eu.dnetlib.data.proto.DedupProtos.Dedup buildParsed()
throws com.google.protobuf.InvalidProtocolBufferException {
eu.dnetlib.data.proto.DedupProtos.Dedup result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(
result).asInvalidProtocolBufferException();
}
return result;
}
public eu.dnetlib.data.proto.DedupProtos.Dedup buildPartial() {
eu.dnetlib.data.proto.DedupProtos.Dedup result = new eu.dnetlib.data.proto.DedupProtos.Dedup(this);
int from_bitField0_ = bitField0_;
int to_bitField0_ = 0;
if (((from_bitField0_ & 0x00000001) == 0x00000001)) {
to_bitField0_ |= 0x00000001;
}
if (relMetadataBuilder_ == null) {
result.relMetadata_ = relMetadata_;
} else {
result.relMetadata_ = relMetadataBuilder_.build();
}
result.bitField0_ = to_bitField0_;
onBuilt();
return result;
}
public Builder mergeFrom(com.google.protobuf.Message other) {
if (other instanceof eu.dnetlib.data.proto.DedupProtos.Dedup) {
return mergeFrom((eu.dnetlib.data.proto.DedupProtos.Dedup)other);
} else {
super.mergeFrom(other);
return this;
}
}
public Builder mergeFrom(eu.dnetlib.data.proto.DedupProtos.Dedup other) {
if (other == eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance()) return this;
if (other.hasRelMetadata()) {
mergeRelMetadata(other.getRelMetadata());
}
this.mergeUnknownFields(other.getUnknownFields());
return this;
}
public final boolean isInitialized() {
if (!hasRelMetadata()) {
return false;
}
if (!getRelMetadata().isInitialized()) {
return false;
}
return true;
}
public Builder mergeFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
com.google.protobuf.UnknownFieldSet.Builder unknownFields =
com.google.protobuf.UnknownFieldSet.newBuilder(
this.getUnknownFields());
while (true) {
int tag = input.readTag();
switch (tag) {
case 0:
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
default: {
if (!parseUnknownField(input, unknownFields,
extensionRegistry, tag)) {
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
}
break;
}
case 10: {
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder subBuilder = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.newBuilder();
if (hasRelMetadata()) {
subBuilder.mergeFrom(getRelMetadata());
}
input.readMessage(subBuilder, extensionRegistry);
setRelMetadata(subBuilder.buildPartial());
break;
}
}
}
}
private int bitField0_;
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
private eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder> relMetadataBuilder_;
public boolean hasRelMetadata() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata() {
if (relMetadataBuilder_ == null) {
return relMetadata_;
} else {
return relMetadataBuilder_.getMessage();
}
}
public Builder setRelMetadata(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata value) {
if (relMetadataBuilder_ == null) {
if (value == null) {
throw new NullPointerException();
}
relMetadata_ = value;
onChanged();
} else {
relMetadataBuilder_.setMessage(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder setRelMetadata(
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder builderForValue) {
if (relMetadataBuilder_ == null) {
relMetadata_ = builderForValue.build();
onChanged();
} else {
relMetadataBuilder_.setMessage(builderForValue.build());
}
bitField0_ |= 0x00000001;
return this;
}
public Builder mergeRelMetadata(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata value) {
if (relMetadataBuilder_ == null) {
if (((bitField0_ & 0x00000001) == 0x00000001) &&
relMetadata_ != eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance()) {
relMetadata_ =
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.newBuilder(relMetadata_).mergeFrom(value).buildPartial();
} else {
relMetadata_ = value;
}
onChanged();
} else {
relMetadataBuilder_.mergeFrom(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder clearRelMetadata() {
if (relMetadataBuilder_ == null) {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
onChanged();
} else {
relMetadataBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder getRelMetadataBuilder() {
bitField0_ |= 0x00000001;
onChanged();
return getRelMetadataFieldBuilder().getBuilder();
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder() {
if (relMetadataBuilder_ != null) {
return relMetadataBuilder_.getMessageOrBuilder();
} else {
return relMetadata_;
}
}
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder>
getRelMetadataFieldBuilder() {
if (relMetadataBuilder_ == null) {
relMetadataBuilder_ = new com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder>(
relMetadata_,
getParentForChildren(),
isClean());
relMetadata_ = null;
}
return relMetadataBuilder_;
}
// @@protoc_insertion_point(builder_scope:eu.dnetlib.data.proto.Dedup)
}
static {
defaultInstance = new Dedup(true);
defaultInstance.initFields();
}
// @@protoc_insertion_point(class_scope:eu.dnetlib.data.proto.Dedup)
}
private static com.google.protobuf.Descriptors.Descriptor
internal_static_eu_dnetlib_data_proto_Dedup_descriptor;
private static
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_eu_dnetlib_data_proto_Dedup_fieldAccessorTable;
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\013Dedup.proto\022\025eu.dnetlib.data.proto\032\021Re" +
"lMetadata.proto\"g\n\005Dedup\0227\n\013relMetadata\030" +
"\001 \002(\0132\".eu.dnetlib.data.proto.RelMetadat" +
"a\"%\n\007RelName\022\016\n\nisMergedIn\020\001\022\n\n\006merges\020\002" +
"B$\n\025eu.dnetlib.data.protoB\013DedupProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
internal_static_eu_dnetlib_data_proto_Dedup_descriptor =
getDescriptor().getMessageTypes().get(0);
internal_static_eu_dnetlib_data_proto_Dedup_fieldAccessorTable = new
com.google.protobuf.GeneratedMessage.FieldAccessorTable(
internal_static_eu_dnetlib_data_proto_Dedup_descriptor,
new java.lang.String[] { "RelMetadata", },
eu.dnetlib.data.proto.DedupProtos.Dedup.class,
eu.dnetlib.data.proto.DedupProtos.Dedup.Builder.class);
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
eu.dnetlib.data.proto.RelMetadataProtos.getDescriptor(),
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

View File

@ -0,0 +1,562 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: DedupSimilarity.proto
package eu.dnetlib.data.proto;
public final class DedupSimilarityProtos {
private DedupSimilarityProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public interface DedupSimilarityOrBuilder
extends com.google.protobuf.MessageOrBuilder {
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
boolean hasRelMetadata();
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata();
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder();
}
public static final class DedupSimilarity extends
com.google.protobuf.GeneratedMessage
implements DedupSimilarityOrBuilder {
// Use DedupSimilarity.newBuilder() to construct.
private DedupSimilarity(Builder builder) {
super(builder);
}
private DedupSimilarity(boolean noInit) {}
private static final DedupSimilarity defaultInstance;
public static DedupSimilarity getDefaultInstance() {
return defaultInstance;
}
public DedupSimilarity getDefaultInstanceForType() {
return defaultInstance;
}
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.internal_static_eu_dnetlib_data_proto_DedupSimilarity_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.internal_static_eu_dnetlib_data_proto_DedupSimilarity_fieldAccessorTable;
}
public enum RelName
implements com.google.protobuf.ProtocolMessageEnum {
isSimilarTo(0, 1),
;
public static final int isSimilarTo_VALUE = 1;
public final int getNumber() { return value; }
public static RelName valueOf(int value) {
switch (value) {
case 1: return isSimilarTo;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<RelName>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<RelName>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<RelName>() {
public RelName findValueByNumber(int number) {
return RelName.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDescriptor().getEnumTypes().get(0);
}
private static final RelName[] VALUES = {
isSimilarTo,
};
public static RelName valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private RelName(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.DedupSimilarity.RelName)
}
private int bitField0_;
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
public static final int RELMETADATA_FIELD_NUMBER = 1;
private eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata relMetadata_;
public boolean hasRelMetadata() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata() {
return relMetadata_;
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder() {
return relMetadata_;
}
private void initFields() {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
}
private byte memoizedIsInitialized = -1;
public final boolean isInitialized() {
byte isInitialized = memoizedIsInitialized;
if (isInitialized != -1) return isInitialized == 1;
if (!hasRelMetadata()) {
memoizedIsInitialized = 0;
return false;
}
if (!getRelMetadata().isInitialized()) {
memoizedIsInitialized = 0;
return false;
}
memoizedIsInitialized = 1;
return true;
}
public void writeTo(com.google.protobuf.CodedOutputStream output)
throws java.io.IOException {
getSerializedSize();
if (((bitField0_ & 0x00000001) == 0x00000001)) {
output.writeMessage(1, relMetadata_);
}
getUnknownFields().writeTo(output);
}
private int memoizedSerializedSize = -1;
public int getSerializedSize() {
int size = memoizedSerializedSize;
if (size != -1) return size;
size = 0;
if (((bitField0_ & 0x00000001) == 0x00000001)) {
size += com.google.protobuf.CodedOutputStream
.computeMessageSize(1, relMetadata_);
}
size += getUnknownFields().getSerializedSize();
memoizedSerializedSize = size;
return size;
}
private static final long serialVersionUID = 0L;
@java.lang.Override
protected java.lang.Object writeReplace()
throws java.io.ObjectStreamException {
return super.writeReplace();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
com.google.protobuf.ByteString data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
com.google.protobuf.ByteString data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
byte[] data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(java.io.InputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseDelimitedFrom(java.io.InputStream input)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseDelimitedFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input, extensionRegistry)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
com.google.protobuf.CodedInputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity parseFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static Builder newBuilder() { return Builder.create(); }
public Builder newBuilderForType() { return newBuilder(); }
public static Builder newBuilder(eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity prototype) {
return newBuilder().mergeFrom(prototype);
}
public Builder toBuilder() { return newBuilder(this); }
@java.lang.Override
protected Builder newBuilderForType(
com.google.protobuf.GeneratedMessage.BuilderParent parent) {
Builder builder = new Builder(parent);
return builder;
}
public static final class Builder extends
com.google.protobuf.GeneratedMessage.Builder<Builder>
implements eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder {
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.internal_static_eu_dnetlib_data_proto_DedupSimilarity_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.internal_static_eu_dnetlib_data_proto_DedupSimilarity_fieldAccessorTable;
}
// Construct using eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.newBuilder()
private Builder() {
maybeForceBuilderInitialization();
}
private Builder(BuilderParent parent) {
super(parent);
maybeForceBuilderInitialization();
}
private void maybeForceBuilderInitialization() {
if (com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders) {
getRelMetadataFieldBuilder();
}
}
private static Builder create() {
return new Builder();
}
public Builder clear() {
super.clear();
if (relMetadataBuilder_ == null) {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
} else {
relMetadataBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public Builder clone() {
return create().mergeFrom(buildPartial());
}
public com.google.protobuf.Descriptors.Descriptor
getDescriptorForType() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDescriptor();
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity getDefaultInstanceForType() {
return eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance();
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity build() {
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(result);
}
return result;
}
private eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity buildParsed()
throws com.google.protobuf.InvalidProtocolBufferException {
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(
result).asInvalidProtocolBufferException();
}
return result;
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity buildPartial() {
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity result = new eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity(this);
int from_bitField0_ = bitField0_;
int to_bitField0_ = 0;
if (((from_bitField0_ & 0x00000001) == 0x00000001)) {
to_bitField0_ |= 0x00000001;
}
if (relMetadataBuilder_ == null) {
result.relMetadata_ = relMetadata_;
} else {
result.relMetadata_ = relMetadataBuilder_.build();
}
result.bitField0_ = to_bitField0_;
onBuilt();
return result;
}
public Builder mergeFrom(com.google.protobuf.Message other) {
if (other instanceof eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity) {
return mergeFrom((eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity)other);
} else {
super.mergeFrom(other);
return this;
}
}
public Builder mergeFrom(eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity other) {
if (other == eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance()) return this;
if (other.hasRelMetadata()) {
mergeRelMetadata(other.getRelMetadata());
}
this.mergeUnknownFields(other.getUnknownFields());
return this;
}
public final boolean isInitialized() {
if (!hasRelMetadata()) {
return false;
}
if (!getRelMetadata().isInitialized()) {
return false;
}
return true;
}
public Builder mergeFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
com.google.protobuf.UnknownFieldSet.Builder unknownFields =
com.google.protobuf.UnknownFieldSet.newBuilder(
this.getUnknownFields());
while (true) {
int tag = input.readTag();
switch (tag) {
case 0:
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
default: {
if (!parseUnknownField(input, unknownFields,
extensionRegistry, tag)) {
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
}
break;
}
case 10: {
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder subBuilder = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.newBuilder();
if (hasRelMetadata()) {
subBuilder.mergeFrom(getRelMetadata());
}
input.readMessage(subBuilder, extensionRegistry);
setRelMetadata(subBuilder.buildPartial());
break;
}
}
}
}
private int bitField0_;
// required .eu.dnetlib.data.proto.RelMetadata relMetadata = 1;
private eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder> relMetadataBuilder_;
public boolean hasRelMetadata() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getRelMetadata() {
if (relMetadataBuilder_ == null) {
return relMetadata_;
} else {
return relMetadataBuilder_.getMessage();
}
}
public Builder setRelMetadata(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata value) {
if (relMetadataBuilder_ == null) {
if (value == null) {
throw new NullPointerException();
}
relMetadata_ = value;
onChanged();
} else {
relMetadataBuilder_.setMessage(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder setRelMetadata(
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder builderForValue) {
if (relMetadataBuilder_ == null) {
relMetadata_ = builderForValue.build();
onChanged();
} else {
relMetadataBuilder_.setMessage(builderForValue.build());
}
bitField0_ |= 0x00000001;
return this;
}
public Builder mergeRelMetadata(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata value) {
if (relMetadataBuilder_ == null) {
if (((bitField0_ & 0x00000001) == 0x00000001) &&
relMetadata_ != eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance()) {
relMetadata_ =
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.newBuilder(relMetadata_).mergeFrom(value).buildPartial();
} else {
relMetadata_ = value;
}
onChanged();
} else {
relMetadataBuilder_.mergeFrom(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder clearRelMetadata() {
if (relMetadataBuilder_ == null) {
relMetadata_ = eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
onChanged();
} else {
relMetadataBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder getRelMetadataBuilder() {
bitField0_ |= 0x00000001;
onChanged();
return getRelMetadataFieldBuilder().getBuilder();
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder getRelMetadataOrBuilder() {
if (relMetadataBuilder_ != null) {
return relMetadataBuilder_.getMessageOrBuilder();
} else {
return relMetadata_;
}
}
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder>
getRelMetadataFieldBuilder() {
if (relMetadataBuilder_ == null) {
relMetadataBuilder_ = new com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder, eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder>(
relMetadata_,
getParentForChildren(),
isClean());
relMetadata_ = null;
}
return relMetadataBuilder_;
}
// @@protoc_insertion_point(builder_scope:eu.dnetlib.data.proto.DedupSimilarity)
}
static {
defaultInstance = new DedupSimilarity(true);
defaultInstance.initFields();
}
// @@protoc_insertion_point(class_scope:eu.dnetlib.data.proto.DedupSimilarity)
}
private static com.google.protobuf.Descriptors.Descriptor
internal_static_eu_dnetlib_data_proto_DedupSimilarity_descriptor;
private static
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_eu_dnetlib_data_proto_DedupSimilarity_fieldAccessorTable;
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\025DedupSimilarity.proto\022\025eu.dnetlib.data" +
".proto\032\021RelMetadata.proto\"f\n\017DedupSimila" +
"rity\0227\n\013relMetadata\030\001 \002(\0132\".eu.dnetlib.d" +
"ata.proto.RelMetadata\"\032\n\007RelName\022\017\n\013isSi" +
"milarTo\020\001B.\n\025eu.dnetlib.data.protoB\025Dedu" +
"pSimilarityProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
internal_static_eu_dnetlib_data_proto_DedupSimilarity_descriptor =
getDescriptor().getMessageTypes().get(0);
internal_static_eu_dnetlib_data_proto_DedupSimilarity_fieldAccessorTable = new
com.google.protobuf.GeneratedMessage.FieldAccessorTable(
internal_static_eu_dnetlib_data_proto_DedupSimilarity_descriptor,
new java.lang.String[] { "RelMetadata", },
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.class,
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder.class);
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
eu.dnetlib.data.proto.RelMetadataProtos.getDescriptor(),
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,108 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: Kind.proto
package eu.dnetlib.data.proto;
public final class KindProtos {
private KindProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public enum Kind
implements com.google.protobuf.ProtocolMessageEnum {
entity(0, 1),
relation(1, 2),
;
public static final int entity_VALUE = 1;
public static final int relation_VALUE = 2;
public final int getNumber() { return value; }
public static Kind valueOf(int value) {
switch (value) {
case 1: return entity;
case 2: return relation;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<Kind>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<Kind>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<Kind>() {
public Kind findValueByNumber(int number) {
return Kind.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.KindProtos.getDescriptor().getEnumTypes().get(0);
}
private static final Kind[] VALUES = {
entity, relation,
};
public static Kind valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private Kind(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.Kind)
}
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\nKind.proto\022\025eu.dnetlib.data.proto* \n\004K" +
"ind\022\n\n\006entity\020\001\022\014\n\010relation\020\002B#\n\025eu.dnet" +
"lib.data.protoB\nKindProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,651 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: Organization_Organization.proto
package eu.dnetlib.data.proto;
public final class OrganizationOrganizationProtos {
private OrganizationOrganizationProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public interface OrganizationOrganizationOrBuilder
extends com.google.protobuf.MessageOrBuilder {
// optional .eu.dnetlib.data.proto.Dedup dedup = 1;
boolean hasDedup();
eu.dnetlib.data.proto.DedupProtos.Dedup getDedup();
eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder getDedupOrBuilder();
// optional .eu.dnetlib.data.proto.DedupSimilarity dedupSimilarity = 2;
boolean hasDedupSimilarity();
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity getDedupSimilarity();
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder getDedupSimilarityOrBuilder();
}
public static final class OrganizationOrganization extends
com.google.protobuf.GeneratedMessage
implements OrganizationOrganizationOrBuilder {
// Use OrganizationOrganization.newBuilder() to construct.
private OrganizationOrganization(Builder builder) {
super(builder);
}
private OrganizationOrganization(boolean noInit) {}
private static final OrganizationOrganization defaultInstance;
public static OrganizationOrganization getDefaultInstance() {
return defaultInstance;
}
public OrganizationOrganization getDefaultInstanceForType() {
return defaultInstance;
}
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.internal_static_eu_dnetlib_data_proto_OrganizationOrganization_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.internal_static_eu_dnetlib_data_proto_OrganizationOrganization_fieldAccessorTable;
}
private int bitField0_;
// optional .eu.dnetlib.data.proto.Dedup dedup = 1;
public static final int DEDUP_FIELD_NUMBER = 1;
private eu.dnetlib.data.proto.DedupProtos.Dedup dedup_;
public boolean hasDedup() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.DedupProtos.Dedup getDedup() {
return dedup_;
}
public eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder getDedupOrBuilder() {
return dedup_;
}
// optional .eu.dnetlib.data.proto.DedupSimilarity dedupSimilarity = 2;
public static final int DEDUPSIMILARITY_FIELD_NUMBER = 2;
private eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity dedupSimilarity_;
public boolean hasDedupSimilarity() {
return ((bitField0_ & 0x00000002) == 0x00000002);
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity getDedupSimilarity() {
return dedupSimilarity_;
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder getDedupSimilarityOrBuilder() {
return dedupSimilarity_;
}
private void initFields() {
dedup_ = eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance();
dedupSimilarity_ = eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance();
}
private byte memoizedIsInitialized = -1;
public final boolean isInitialized() {
byte isInitialized = memoizedIsInitialized;
if (isInitialized != -1) return isInitialized == 1;
if (hasDedup()) {
if (!getDedup().isInitialized()) {
memoizedIsInitialized = 0;
return false;
}
}
if (hasDedupSimilarity()) {
if (!getDedupSimilarity().isInitialized()) {
memoizedIsInitialized = 0;
return false;
}
}
memoizedIsInitialized = 1;
return true;
}
public void writeTo(com.google.protobuf.CodedOutputStream output)
throws java.io.IOException {
getSerializedSize();
if (((bitField0_ & 0x00000001) == 0x00000001)) {
output.writeMessage(1, dedup_);
}
if (((bitField0_ & 0x00000002) == 0x00000002)) {
output.writeMessage(2, dedupSimilarity_);
}
getUnknownFields().writeTo(output);
}
private int memoizedSerializedSize = -1;
public int getSerializedSize() {
int size = memoizedSerializedSize;
if (size != -1) return size;
size = 0;
if (((bitField0_ & 0x00000001) == 0x00000001)) {
size += com.google.protobuf.CodedOutputStream
.computeMessageSize(1, dedup_);
}
if (((bitField0_ & 0x00000002) == 0x00000002)) {
size += com.google.protobuf.CodedOutputStream
.computeMessageSize(2, dedupSimilarity_);
}
size += getUnknownFields().getSerializedSize();
memoizedSerializedSize = size;
return size;
}
private static final long serialVersionUID = 0L;
@java.lang.Override
protected java.lang.Object writeReplace()
throws java.io.ObjectStreamException {
return super.writeReplace();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
com.google.protobuf.ByteString data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
com.google.protobuf.ByteString data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
byte[] data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(java.io.InputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseDelimitedFrom(java.io.InputStream input)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseDelimitedFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input, extensionRegistry)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
com.google.protobuf.CodedInputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization parseFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static Builder newBuilder() { return Builder.create(); }
public Builder newBuilderForType() { return newBuilder(); }
public static Builder newBuilder(eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization prototype) {
return newBuilder().mergeFrom(prototype);
}
public Builder toBuilder() { return newBuilder(this); }
@java.lang.Override
protected Builder newBuilderForType(
com.google.protobuf.GeneratedMessage.BuilderParent parent) {
Builder builder = new Builder(parent);
return builder;
}
public static final class Builder extends
com.google.protobuf.GeneratedMessage.Builder<Builder>
implements eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganizationOrBuilder {
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.internal_static_eu_dnetlib_data_proto_OrganizationOrganization_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.internal_static_eu_dnetlib_data_proto_OrganizationOrganization_fieldAccessorTable;
}
// Construct using eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.newBuilder()
private Builder() {
maybeForceBuilderInitialization();
}
private Builder(BuilderParent parent) {
super(parent);
maybeForceBuilderInitialization();
}
private void maybeForceBuilderInitialization() {
if (com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders) {
getDedupFieldBuilder();
getDedupSimilarityFieldBuilder();
}
}
private static Builder create() {
return new Builder();
}
public Builder clear() {
super.clear();
if (dedupBuilder_ == null) {
dedup_ = eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance();
} else {
dedupBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
if (dedupSimilarityBuilder_ == null) {
dedupSimilarity_ = eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance();
} else {
dedupSimilarityBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000002);
return this;
}
public Builder clone() {
return create().mergeFrom(buildPartial());
}
public com.google.protobuf.Descriptors.Descriptor
getDescriptorForType() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.getDescriptor();
}
public eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization getDefaultInstanceForType() {
return eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.getDefaultInstance();
}
public eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization build() {
eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(result);
}
return result;
}
private eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization buildParsed()
throws com.google.protobuf.InvalidProtocolBufferException {
eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(
result).asInvalidProtocolBufferException();
}
return result;
}
public eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization buildPartial() {
eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization result = new eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization(this);
int from_bitField0_ = bitField0_;
int to_bitField0_ = 0;
if (((from_bitField0_ & 0x00000001) == 0x00000001)) {
to_bitField0_ |= 0x00000001;
}
if (dedupBuilder_ == null) {
result.dedup_ = dedup_;
} else {
result.dedup_ = dedupBuilder_.build();
}
if (((from_bitField0_ & 0x00000002) == 0x00000002)) {
to_bitField0_ |= 0x00000002;
}
if (dedupSimilarityBuilder_ == null) {
result.dedupSimilarity_ = dedupSimilarity_;
} else {
result.dedupSimilarity_ = dedupSimilarityBuilder_.build();
}
result.bitField0_ = to_bitField0_;
onBuilt();
return result;
}
public Builder mergeFrom(com.google.protobuf.Message other) {
if (other instanceof eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization) {
return mergeFrom((eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization)other);
} else {
super.mergeFrom(other);
return this;
}
}
public Builder mergeFrom(eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization other) {
if (other == eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.getDefaultInstance()) return this;
if (other.hasDedup()) {
mergeDedup(other.getDedup());
}
if (other.hasDedupSimilarity()) {
mergeDedupSimilarity(other.getDedupSimilarity());
}
this.mergeUnknownFields(other.getUnknownFields());
return this;
}
public final boolean isInitialized() {
if (hasDedup()) {
if (!getDedup().isInitialized()) {
return false;
}
}
if (hasDedupSimilarity()) {
if (!getDedupSimilarity().isInitialized()) {
return false;
}
}
return true;
}
public Builder mergeFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
com.google.protobuf.UnknownFieldSet.Builder unknownFields =
com.google.protobuf.UnknownFieldSet.newBuilder(
this.getUnknownFields());
while (true) {
int tag = input.readTag();
switch (tag) {
case 0:
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
default: {
if (!parseUnknownField(input, unknownFields,
extensionRegistry, tag)) {
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
}
break;
}
case 10: {
eu.dnetlib.data.proto.DedupProtos.Dedup.Builder subBuilder = eu.dnetlib.data.proto.DedupProtos.Dedup.newBuilder();
if (hasDedup()) {
subBuilder.mergeFrom(getDedup());
}
input.readMessage(subBuilder, extensionRegistry);
setDedup(subBuilder.buildPartial());
break;
}
case 18: {
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder subBuilder = eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.newBuilder();
if (hasDedupSimilarity()) {
subBuilder.mergeFrom(getDedupSimilarity());
}
input.readMessage(subBuilder, extensionRegistry);
setDedupSimilarity(subBuilder.buildPartial());
break;
}
}
}
}
private int bitField0_;
// optional .eu.dnetlib.data.proto.Dedup dedup = 1;
private eu.dnetlib.data.proto.DedupProtos.Dedup dedup_ = eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance();
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupProtos.Dedup, eu.dnetlib.data.proto.DedupProtos.Dedup.Builder, eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder> dedupBuilder_;
public boolean hasDedup() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.DedupProtos.Dedup getDedup() {
if (dedupBuilder_ == null) {
return dedup_;
} else {
return dedupBuilder_.getMessage();
}
}
public Builder setDedup(eu.dnetlib.data.proto.DedupProtos.Dedup value) {
if (dedupBuilder_ == null) {
if (value == null) {
throw new NullPointerException();
}
dedup_ = value;
onChanged();
} else {
dedupBuilder_.setMessage(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder setDedup(
eu.dnetlib.data.proto.DedupProtos.Dedup.Builder builderForValue) {
if (dedupBuilder_ == null) {
dedup_ = builderForValue.build();
onChanged();
} else {
dedupBuilder_.setMessage(builderForValue.build());
}
bitField0_ |= 0x00000001;
return this;
}
public Builder mergeDedup(eu.dnetlib.data.proto.DedupProtos.Dedup value) {
if (dedupBuilder_ == null) {
if (((bitField0_ & 0x00000001) == 0x00000001) &&
dedup_ != eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance()) {
dedup_ =
eu.dnetlib.data.proto.DedupProtos.Dedup.newBuilder(dedup_).mergeFrom(value).buildPartial();
} else {
dedup_ = value;
}
onChanged();
} else {
dedupBuilder_.mergeFrom(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder clearDedup() {
if (dedupBuilder_ == null) {
dedup_ = eu.dnetlib.data.proto.DedupProtos.Dedup.getDefaultInstance();
onChanged();
} else {
dedupBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public eu.dnetlib.data.proto.DedupProtos.Dedup.Builder getDedupBuilder() {
bitField0_ |= 0x00000001;
onChanged();
return getDedupFieldBuilder().getBuilder();
}
public eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder getDedupOrBuilder() {
if (dedupBuilder_ != null) {
return dedupBuilder_.getMessageOrBuilder();
} else {
return dedup_;
}
}
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupProtos.Dedup, eu.dnetlib.data.proto.DedupProtos.Dedup.Builder, eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder>
getDedupFieldBuilder() {
if (dedupBuilder_ == null) {
dedupBuilder_ = new com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupProtos.Dedup, eu.dnetlib.data.proto.DedupProtos.Dedup.Builder, eu.dnetlib.data.proto.DedupProtos.DedupOrBuilder>(
dedup_,
getParentForChildren(),
isClean());
dedup_ = null;
}
return dedupBuilder_;
}
// optional .eu.dnetlib.data.proto.DedupSimilarity dedupSimilarity = 2;
private eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity dedupSimilarity_ = eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance();
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder> dedupSimilarityBuilder_;
public boolean hasDedupSimilarity() {
return ((bitField0_ & 0x00000002) == 0x00000002);
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity getDedupSimilarity() {
if (dedupSimilarityBuilder_ == null) {
return dedupSimilarity_;
} else {
return dedupSimilarityBuilder_.getMessage();
}
}
public Builder setDedupSimilarity(eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity value) {
if (dedupSimilarityBuilder_ == null) {
if (value == null) {
throw new NullPointerException();
}
dedupSimilarity_ = value;
onChanged();
} else {
dedupSimilarityBuilder_.setMessage(value);
}
bitField0_ |= 0x00000002;
return this;
}
public Builder setDedupSimilarity(
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder builderForValue) {
if (dedupSimilarityBuilder_ == null) {
dedupSimilarity_ = builderForValue.build();
onChanged();
} else {
dedupSimilarityBuilder_.setMessage(builderForValue.build());
}
bitField0_ |= 0x00000002;
return this;
}
public Builder mergeDedupSimilarity(eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity value) {
if (dedupSimilarityBuilder_ == null) {
if (((bitField0_ & 0x00000002) == 0x00000002) &&
dedupSimilarity_ != eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance()) {
dedupSimilarity_ =
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.newBuilder(dedupSimilarity_).mergeFrom(value).buildPartial();
} else {
dedupSimilarity_ = value;
}
onChanged();
} else {
dedupSimilarityBuilder_.mergeFrom(value);
}
bitField0_ |= 0x00000002;
return this;
}
public Builder clearDedupSimilarity() {
if (dedupSimilarityBuilder_ == null) {
dedupSimilarity_ = eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.getDefaultInstance();
onChanged();
} else {
dedupSimilarityBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000002);
return this;
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder getDedupSimilarityBuilder() {
bitField0_ |= 0x00000002;
onChanged();
return getDedupSimilarityFieldBuilder().getBuilder();
}
public eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder getDedupSimilarityOrBuilder() {
if (dedupSimilarityBuilder_ != null) {
return dedupSimilarityBuilder_.getMessageOrBuilder();
} else {
return dedupSimilarity_;
}
}
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder>
getDedupSimilarityFieldBuilder() {
if (dedupSimilarityBuilder_ == null) {
dedupSimilarityBuilder_ = new com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarity.Builder, eu.dnetlib.data.proto.DedupSimilarityProtos.DedupSimilarityOrBuilder>(
dedupSimilarity_,
getParentForChildren(),
isClean());
dedupSimilarity_ = null;
}
return dedupSimilarityBuilder_;
}
// @@protoc_insertion_point(builder_scope:eu.dnetlib.data.proto.OrganizationOrganization)
}
static {
defaultInstance = new OrganizationOrganization(true);
defaultInstance.initFields();
}
// @@protoc_insertion_point(class_scope:eu.dnetlib.data.proto.OrganizationOrganization)
}
private static com.google.protobuf.Descriptors.Descriptor
internal_static_eu_dnetlib_data_proto_OrganizationOrganization_descriptor;
private static
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_eu_dnetlib_data_proto_OrganizationOrganization_fieldAccessorTable;
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\037Organization_Organization.proto\022\025eu.dn" +
"etlib.data.proto\032\021RelMetadata.proto\032\013Ded" +
"up.proto\032\025DedupSimilarity.proto\"\210\001\n\030Orga" +
"nizationOrganization\022+\n\005dedup\030\001 \001(\0132\034.eu" +
".dnetlib.data.proto.Dedup\022?\n\017dedupSimila" +
"rity\030\002 \001(\0132&.eu.dnetlib.data.proto.Dedup" +
"SimilarityB7\n\025eu.dnetlib.data.protoB\036Org" +
"anizationOrganizationProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
internal_static_eu_dnetlib_data_proto_OrganizationOrganization_descriptor =
getDescriptor().getMessageTypes().get(0);
internal_static_eu_dnetlib_data_proto_OrganizationOrganization_fieldAccessorTable = new
com.google.protobuf.GeneratedMessage.FieldAccessorTable(
internal_static_eu_dnetlib_data_proto_OrganizationOrganization_descriptor,
new java.lang.String[] { "Dedup", "DedupSimilarity", },
eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.class,
eu.dnetlib.data.proto.OrganizationOrganizationProtos.OrganizationOrganization.Builder.class);
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
eu.dnetlib.data.proto.RelMetadataProtos.getDescriptor(),
eu.dnetlib.data.proto.DedupProtos.getDescriptor(),
eu.dnetlib.data.proto.DedupSimilarityProtos.getDescriptor(),
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,680 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: RelMetadata.proto
package eu.dnetlib.data.proto;
public final class RelMetadataProtos {
private RelMetadataProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public interface RelMetadataOrBuilder
extends com.google.protobuf.MessageOrBuilder {
// optional .eu.dnetlib.data.proto.Qualifier semantics = 1;
boolean hasSemantics();
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier getSemantics();
eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder getSemanticsOrBuilder();
// optional string startdate = 3;
boolean hasStartdate();
String getStartdate();
// optional string enddate = 4;
boolean hasEnddate();
String getEnddate();
}
public static final class RelMetadata extends
com.google.protobuf.GeneratedMessage
implements RelMetadataOrBuilder {
// Use RelMetadata.newBuilder() to construct.
private RelMetadata(Builder builder) {
super(builder);
}
private RelMetadata(boolean noInit) {}
private static final RelMetadata defaultInstance;
public static RelMetadata getDefaultInstance() {
return defaultInstance;
}
public RelMetadata getDefaultInstanceForType() {
return defaultInstance;
}
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.RelMetadataProtos.internal_static_eu_dnetlib_data_proto_RelMetadata_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.RelMetadataProtos.internal_static_eu_dnetlib_data_proto_RelMetadata_fieldAccessorTable;
}
private int bitField0_;
// optional .eu.dnetlib.data.proto.Qualifier semantics = 1;
public static final int SEMANTICS_FIELD_NUMBER = 1;
private eu.dnetlib.data.proto.FieldTypeProtos.Qualifier semantics_;
public boolean hasSemantics() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.FieldTypeProtos.Qualifier getSemantics() {
return semantics_;
}
public eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder getSemanticsOrBuilder() {
return semantics_;
}
// optional string startdate = 3;
public static final int STARTDATE_FIELD_NUMBER = 3;
private java.lang.Object startdate_;
public boolean hasStartdate() {
return ((bitField0_ & 0x00000002) == 0x00000002);
}
public String getStartdate() {
java.lang.Object ref = startdate_;
if (ref instanceof String) {
return (String) ref;
} else {
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
String s = bs.toStringUtf8();
if (com.google.protobuf.Internal.isValidUtf8(bs)) {
startdate_ = s;
}
return s;
}
}
private com.google.protobuf.ByteString getStartdateBytes() {
java.lang.Object ref = startdate_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8((String) ref);
startdate_ = b;
return b;
} else {
return (com.google.protobuf.ByteString) ref;
}
}
// optional string enddate = 4;
public static final int ENDDATE_FIELD_NUMBER = 4;
private java.lang.Object enddate_;
public boolean hasEnddate() {
return ((bitField0_ & 0x00000004) == 0x00000004);
}
public String getEnddate() {
java.lang.Object ref = enddate_;
if (ref instanceof String) {
return (String) ref;
} else {
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
String s = bs.toStringUtf8();
if (com.google.protobuf.Internal.isValidUtf8(bs)) {
enddate_ = s;
}
return s;
}
}
private com.google.protobuf.ByteString getEnddateBytes() {
java.lang.Object ref = enddate_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8((String) ref);
enddate_ = b;
return b;
} else {
return (com.google.protobuf.ByteString) ref;
}
}
private void initFields() {
semantics_ = eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.getDefaultInstance();
startdate_ = "";
enddate_ = "";
}
private byte memoizedIsInitialized = -1;
public final boolean isInitialized() {
byte isInitialized = memoizedIsInitialized;
if (isInitialized != -1) return isInitialized == 1;
if (hasSemantics()) {
if (!getSemantics().isInitialized()) {
memoizedIsInitialized = 0;
return false;
}
}
memoizedIsInitialized = 1;
return true;
}
public void writeTo(com.google.protobuf.CodedOutputStream output)
throws java.io.IOException {
getSerializedSize();
if (((bitField0_ & 0x00000001) == 0x00000001)) {
output.writeMessage(1, semantics_);
}
if (((bitField0_ & 0x00000002) == 0x00000002)) {
output.writeBytes(3, getStartdateBytes());
}
if (((bitField0_ & 0x00000004) == 0x00000004)) {
output.writeBytes(4, getEnddateBytes());
}
getUnknownFields().writeTo(output);
}
private int memoizedSerializedSize = -1;
public int getSerializedSize() {
int size = memoizedSerializedSize;
if (size != -1) return size;
size = 0;
if (((bitField0_ & 0x00000001) == 0x00000001)) {
size += com.google.protobuf.CodedOutputStream
.computeMessageSize(1, semantics_);
}
if (((bitField0_ & 0x00000002) == 0x00000002)) {
size += com.google.protobuf.CodedOutputStream
.computeBytesSize(3, getStartdateBytes());
}
if (((bitField0_ & 0x00000004) == 0x00000004)) {
size += com.google.protobuf.CodedOutputStream
.computeBytesSize(4, getEnddateBytes());
}
size += getUnknownFields().getSerializedSize();
memoizedSerializedSize = size;
return size;
}
private static final long serialVersionUID = 0L;
@java.lang.Override
protected java.lang.Object writeReplace()
throws java.io.ObjectStreamException {
return super.writeReplace();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
com.google.protobuf.ByteString data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
com.google.protobuf.ByteString data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(byte[] data)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data).buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
byte[] data,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws com.google.protobuf.InvalidProtocolBufferException {
return newBuilder().mergeFrom(data, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(java.io.InputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseDelimitedFrom(java.io.InputStream input)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseDelimitedFrom(
java.io.InputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
Builder builder = newBuilder();
if (builder.mergeDelimitedFrom(input, extensionRegistry)) {
return builder.buildParsed();
} else {
return null;
}
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
com.google.protobuf.CodedInputStream input)
throws java.io.IOException {
return newBuilder().mergeFrom(input).buildParsed();
}
public static eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata parseFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
return newBuilder().mergeFrom(input, extensionRegistry)
.buildParsed();
}
public static Builder newBuilder() { return Builder.create(); }
public Builder newBuilderForType() { return newBuilder(); }
public static Builder newBuilder(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata prototype) {
return newBuilder().mergeFrom(prototype);
}
public Builder toBuilder() { return newBuilder(this); }
@java.lang.Override
protected Builder newBuilderForType(
com.google.protobuf.GeneratedMessage.BuilderParent parent) {
Builder builder = new Builder(parent);
return builder;
}
public static final class Builder extends
com.google.protobuf.GeneratedMessage.Builder<Builder>
implements eu.dnetlib.data.proto.RelMetadataProtos.RelMetadataOrBuilder {
public static final com.google.protobuf.Descriptors.Descriptor
getDescriptor() {
return eu.dnetlib.data.proto.RelMetadataProtos.internal_static_eu_dnetlib_data_proto_RelMetadata_descriptor;
}
protected com.google.protobuf.GeneratedMessage.FieldAccessorTable
internalGetFieldAccessorTable() {
return eu.dnetlib.data.proto.RelMetadataProtos.internal_static_eu_dnetlib_data_proto_RelMetadata_fieldAccessorTable;
}
// Construct using eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.newBuilder()
private Builder() {
maybeForceBuilderInitialization();
}
private Builder(BuilderParent parent) {
super(parent);
maybeForceBuilderInitialization();
}
private void maybeForceBuilderInitialization() {
if (com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders) {
getSemanticsFieldBuilder();
}
}
private static Builder create() {
return new Builder();
}
public Builder clear() {
super.clear();
if (semanticsBuilder_ == null) {
semantics_ = eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.getDefaultInstance();
} else {
semanticsBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
startdate_ = "";
bitField0_ = (bitField0_ & ~0x00000002);
enddate_ = "";
bitField0_ = (bitField0_ & ~0x00000004);
return this;
}
public Builder clone() {
return create().mergeFrom(buildPartial());
}
public com.google.protobuf.Descriptors.Descriptor
getDescriptorForType() {
return eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDescriptor();
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata getDefaultInstanceForType() {
return eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance();
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata build() {
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(result);
}
return result;
}
private eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata buildParsed()
throws com.google.protobuf.InvalidProtocolBufferException {
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata result = buildPartial();
if (!result.isInitialized()) {
throw newUninitializedMessageException(
result).asInvalidProtocolBufferException();
}
return result;
}
public eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata buildPartial() {
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata result = new eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata(this);
int from_bitField0_ = bitField0_;
int to_bitField0_ = 0;
if (((from_bitField0_ & 0x00000001) == 0x00000001)) {
to_bitField0_ |= 0x00000001;
}
if (semanticsBuilder_ == null) {
result.semantics_ = semantics_;
} else {
result.semantics_ = semanticsBuilder_.build();
}
if (((from_bitField0_ & 0x00000002) == 0x00000002)) {
to_bitField0_ |= 0x00000002;
}
result.startdate_ = startdate_;
if (((from_bitField0_ & 0x00000004) == 0x00000004)) {
to_bitField0_ |= 0x00000004;
}
result.enddate_ = enddate_;
result.bitField0_ = to_bitField0_;
onBuilt();
return result;
}
public Builder mergeFrom(com.google.protobuf.Message other) {
if (other instanceof eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata) {
return mergeFrom((eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata)other);
} else {
super.mergeFrom(other);
return this;
}
}
public Builder mergeFrom(eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata other) {
if (other == eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.getDefaultInstance()) return this;
if (other.hasSemantics()) {
mergeSemantics(other.getSemantics());
}
if (other.hasStartdate()) {
setStartdate(other.getStartdate());
}
if (other.hasEnddate()) {
setEnddate(other.getEnddate());
}
this.mergeUnknownFields(other.getUnknownFields());
return this;
}
public final boolean isInitialized() {
if (hasSemantics()) {
if (!getSemantics().isInitialized()) {
return false;
}
}
return true;
}
public Builder mergeFrom(
com.google.protobuf.CodedInputStream input,
com.google.protobuf.ExtensionRegistryLite extensionRegistry)
throws java.io.IOException {
com.google.protobuf.UnknownFieldSet.Builder unknownFields =
com.google.protobuf.UnknownFieldSet.newBuilder(
this.getUnknownFields());
while (true) {
int tag = input.readTag();
switch (tag) {
case 0:
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
default: {
if (!parseUnknownField(input, unknownFields,
extensionRegistry, tag)) {
this.setUnknownFields(unknownFields.build());
onChanged();
return this;
}
break;
}
case 10: {
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder subBuilder = eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.newBuilder();
if (hasSemantics()) {
subBuilder.mergeFrom(getSemantics());
}
input.readMessage(subBuilder, extensionRegistry);
setSemantics(subBuilder.buildPartial());
break;
}
case 26: {
bitField0_ |= 0x00000002;
startdate_ = input.readBytes();
break;
}
case 34: {
bitField0_ |= 0x00000004;
enddate_ = input.readBytes();
break;
}
}
}
}
private int bitField0_;
// optional .eu.dnetlib.data.proto.Qualifier semantics = 1;
private eu.dnetlib.data.proto.FieldTypeProtos.Qualifier semantics_ = eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.getDefaultInstance();
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier, eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder, eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder> semanticsBuilder_;
public boolean hasSemantics() {
return ((bitField0_ & 0x00000001) == 0x00000001);
}
public eu.dnetlib.data.proto.FieldTypeProtos.Qualifier getSemantics() {
if (semanticsBuilder_ == null) {
return semantics_;
} else {
return semanticsBuilder_.getMessage();
}
}
public Builder setSemantics(eu.dnetlib.data.proto.FieldTypeProtos.Qualifier value) {
if (semanticsBuilder_ == null) {
if (value == null) {
throw new NullPointerException();
}
semantics_ = value;
onChanged();
} else {
semanticsBuilder_.setMessage(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder setSemantics(
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder builderForValue) {
if (semanticsBuilder_ == null) {
semantics_ = builderForValue.build();
onChanged();
} else {
semanticsBuilder_.setMessage(builderForValue.build());
}
bitField0_ |= 0x00000001;
return this;
}
public Builder mergeSemantics(eu.dnetlib.data.proto.FieldTypeProtos.Qualifier value) {
if (semanticsBuilder_ == null) {
if (((bitField0_ & 0x00000001) == 0x00000001) &&
semantics_ != eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.getDefaultInstance()) {
semantics_ =
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.newBuilder(semantics_).mergeFrom(value).buildPartial();
} else {
semantics_ = value;
}
onChanged();
} else {
semanticsBuilder_.mergeFrom(value);
}
bitField0_ |= 0x00000001;
return this;
}
public Builder clearSemantics() {
if (semanticsBuilder_ == null) {
semantics_ = eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.getDefaultInstance();
onChanged();
} else {
semanticsBuilder_.clear();
}
bitField0_ = (bitField0_ & ~0x00000001);
return this;
}
public eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder getSemanticsBuilder() {
bitField0_ |= 0x00000001;
onChanged();
return getSemanticsFieldBuilder().getBuilder();
}
public eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder getSemanticsOrBuilder() {
if (semanticsBuilder_ != null) {
return semanticsBuilder_.getMessageOrBuilder();
} else {
return semantics_;
}
}
private com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier, eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder, eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder>
getSemanticsFieldBuilder() {
if (semanticsBuilder_ == null) {
semanticsBuilder_ = new com.google.protobuf.SingleFieldBuilder<
eu.dnetlib.data.proto.FieldTypeProtos.Qualifier, eu.dnetlib.data.proto.FieldTypeProtos.Qualifier.Builder, eu.dnetlib.data.proto.FieldTypeProtos.QualifierOrBuilder>(
semantics_,
getParentForChildren(),
isClean());
semantics_ = null;
}
return semanticsBuilder_;
}
// optional string startdate = 3;
private java.lang.Object startdate_ = "";
public boolean hasStartdate() {
return ((bitField0_ & 0x00000002) == 0x00000002);
}
public String getStartdate() {
java.lang.Object ref = startdate_;
if (!(ref instanceof String)) {
String s = ((com.google.protobuf.ByteString) ref).toStringUtf8();
startdate_ = s;
return s;
} else {
return (String) ref;
}
}
public Builder setStartdate(String value) {
if (value == null) {
throw new NullPointerException();
}
bitField0_ |= 0x00000002;
startdate_ = value;
onChanged();
return this;
}
public Builder clearStartdate() {
bitField0_ = (bitField0_ & ~0x00000002);
startdate_ = getDefaultInstance().getStartdate();
onChanged();
return this;
}
void setStartdate(com.google.protobuf.ByteString value) {
bitField0_ |= 0x00000002;
startdate_ = value;
onChanged();
}
// optional string enddate = 4;
private java.lang.Object enddate_ = "";
public boolean hasEnddate() {
return ((bitField0_ & 0x00000004) == 0x00000004);
}
public String getEnddate() {
java.lang.Object ref = enddate_;
if (!(ref instanceof String)) {
String s = ((com.google.protobuf.ByteString) ref).toStringUtf8();
enddate_ = s;
return s;
} else {
return (String) ref;
}
}
public Builder setEnddate(String value) {
if (value == null) {
throw new NullPointerException();
}
bitField0_ |= 0x00000004;
enddate_ = value;
onChanged();
return this;
}
public Builder clearEnddate() {
bitField0_ = (bitField0_ & ~0x00000004);
enddate_ = getDefaultInstance().getEnddate();
onChanged();
return this;
}
void setEnddate(com.google.protobuf.ByteString value) {
bitField0_ |= 0x00000004;
enddate_ = value;
onChanged();
}
// @@protoc_insertion_point(builder_scope:eu.dnetlib.data.proto.RelMetadata)
}
static {
defaultInstance = new RelMetadata(true);
defaultInstance.initFields();
}
// @@protoc_insertion_point(class_scope:eu.dnetlib.data.proto.RelMetadata)
}
private static com.google.protobuf.Descriptors.Descriptor
internal_static_eu_dnetlib_data_proto_RelMetadata_descriptor;
private static
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_eu_dnetlib_data_proto_RelMetadata_fieldAccessorTable;
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\021RelMetadata.proto\022\025eu.dnetlib.data.pro" +
"to\032\017FieldType.proto\"f\n\013RelMetadata\0223\n\tse" +
"mantics\030\001 \001(\0132 .eu.dnetlib.data.proto.Qu" +
"alifier\022\021\n\tstartdate\030\003 \001(\t\022\017\n\007enddate\030\004 " +
"\001(\tB*\n\025eu.dnetlib.data.protoB\021RelMetadat" +
"aProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
internal_static_eu_dnetlib_data_proto_RelMetadata_descriptor =
getDescriptor().getMessageTypes().get(0);
internal_static_eu_dnetlib_data_proto_RelMetadata_fieldAccessorTable = new
com.google.protobuf.GeneratedMessage.FieldAccessorTable(
internal_static_eu_dnetlib_data_proto_RelMetadata_descriptor,
new java.lang.String[] { "Semantics", "Startdate", "Enddate", },
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.class,
eu.dnetlib.data.proto.RelMetadataProtos.RelMetadata.Builder.class);
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
eu.dnetlib.data.proto.FieldTypeProtos.getDescriptor(),
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

View File

@ -0,0 +1,228 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: RelType.proto
package eu.dnetlib.data.proto;
public final class RelTypeProtos {
private RelTypeProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public enum RelType
implements com.google.protobuf.ProtocolMessageEnum {
datasourceOrganization(0, 1),
projectOrganization(1, 4),
resultOrganization(2, 5),
resultProject(3, 6),
resultResult(4, 9),
organizationOrganization(5, 11),
;
public static final int datasourceOrganization_VALUE = 1;
public static final int projectOrganization_VALUE = 4;
public static final int resultOrganization_VALUE = 5;
public static final int resultProject_VALUE = 6;
public static final int resultResult_VALUE = 9;
public static final int organizationOrganization_VALUE = 11;
public final int getNumber() { return value; }
public static RelType valueOf(int value) {
switch (value) {
case 1: return datasourceOrganization;
case 4: return projectOrganization;
case 5: return resultOrganization;
case 6: return resultProject;
case 9: return resultResult;
case 11: return organizationOrganization;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<RelType>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<RelType>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<RelType>() {
public RelType findValueByNumber(int number) {
return RelType.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.RelTypeProtos.getDescriptor().getEnumTypes().get(0);
}
private static final RelType[] VALUES = {
datasourceOrganization, projectOrganization, resultOrganization, resultProject, resultResult, organizationOrganization,
};
public static RelType valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private RelType(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.RelType)
}
public enum SubRelType
implements com.google.protobuf.ProtocolMessageEnum {
provision(0, 1),
participation(1, 4),
outcome(2, 6),
similarity(3, 8),
publicationDataset(4, 9),
affiliation(5, 12),
dedup(6, 10),
dedupSimilarity(7, 11),
supplement(8, 13),
part(9, 15),
version(10, 16),
relationship(11, 17),
;
public static final int provision_VALUE = 1;
public static final int participation_VALUE = 4;
public static final int outcome_VALUE = 6;
public static final int similarity_VALUE = 8;
public static final int publicationDataset_VALUE = 9;
public static final int affiliation_VALUE = 12;
public static final int dedup_VALUE = 10;
public static final int dedupSimilarity_VALUE = 11;
public static final int supplement_VALUE = 13;
public static final int part_VALUE = 15;
public static final int version_VALUE = 16;
public static final int relationship_VALUE = 17;
public final int getNumber() { return value; }
public static SubRelType valueOf(int value) {
switch (value) {
case 1: return provision;
case 4: return participation;
case 6: return outcome;
case 8: return similarity;
case 9: return publicationDataset;
case 12: return affiliation;
case 10: return dedup;
case 11: return dedupSimilarity;
case 13: return supplement;
case 15: return part;
case 16: return version;
case 17: return relationship;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<SubRelType>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<SubRelType>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<SubRelType>() {
public SubRelType findValueByNumber(int number) {
return SubRelType.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.RelTypeProtos.getDescriptor().getEnumTypes().get(1);
}
private static final SubRelType[] VALUES = {
provision, participation, outcome, similarity, publicationDataset, affiliation, dedup, dedupSimilarity, supplement, part, version, relationship,
};
public static SubRelType valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private SubRelType(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.SubRelType)
}
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\rRelType.proto\022\025eu.dnetlib.data.proto*\231" +
"\001\n\007RelType\022\032\n\026datasourceOrganization\020\001\022\027" +
"\n\023projectOrganization\020\004\022\026\n\022resultOrganiz" +
"ation\020\005\022\021\n\rresultProject\020\006\022\020\n\014resultResu" +
"lt\020\t\022\034\n\030organizationOrganization\020\013*\315\001\n\nS" +
"ubRelType\022\r\n\tprovision\020\001\022\021\n\rparticipatio" +
"n\020\004\022\013\n\007outcome\020\006\022\016\n\nsimilarity\020\010\022\026\n\022publ" +
"icationDataset\020\t\022\017\n\013affiliation\020\014\022\t\n\005ded" +
"up\020\n\022\023\n\017dedupSimilarity\020\013\022\016\n\nsupplement\020" +
"\r\022\010\n\004part\020\017\022\013\n\007version\020\020\022\020\n\014relationship",
"\020\021B&\n\025eu.dnetlib.data.protoB\rRelTypeProt" +
"os"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,109 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: SpecialTrust.proto
package eu.dnetlib.data.proto;
public final class SpecialTrustProtos {
private SpecialTrustProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public enum SpecialTrust
implements com.google.protobuf.ProtocolMessageEnum {
INFINITE(0, 1),
NEUTRAL(1, 2),
;
public static final int INFINITE_VALUE = 1;
public static final int NEUTRAL_VALUE = 2;
public final int getNumber() { return value; }
public static SpecialTrust valueOf(int value) {
switch (value) {
case 1: return INFINITE;
case 2: return NEUTRAL;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<SpecialTrust>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<SpecialTrust>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<SpecialTrust>() {
public SpecialTrust findValueByNumber(int number) {
return SpecialTrust.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.SpecialTrustProtos.getDescriptor().getEnumTypes().get(0);
}
private static final SpecialTrust[] VALUES = {
INFINITE, NEUTRAL,
};
public static SpecialTrust valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private SpecialTrust(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.SpecialTrust)
}
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\022SpecialTrust.proto\022\025eu.dnetlib.data.pr" +
"oto*)\n\014SpecialTrust\022\014\n\010INFINITE\020\001\022\013\n\007NEU" +
"TRAL\020\002B+\n\025eu.dnetlib.data.protoB\022Special" +
"TrustProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

View File

@ -0,0 +1,118 @@
// Generated by the protocol buffer compiler. DO NOT EDIT!
// source: Type.proto
package eu.dnetlib.data.proto;
public final class TypeProtos {
private TypeProtos() {}
public static void registerAllExtensions(
com.google.protobuf.ExtensionRegistry registry) {
}
public enum Type
implements com.google.protobuf.ProtocolMessageEnum {
datasource(0, 10),
organization(1, 20),
person(2, 30),
project(3, 40),
result(4, 50),
;
public static final int datasource_VALUE = 10;
public static final int organization_VALUE = 20;
public static final int person_VALUE = 30;
public static final int project_VALUE = 40;
public static final int result_VALUE = 50;
public final int getNumber() { return value; }
public static Type valueOf(int value) {
switch (value) {
case 10: return datasource;
case 20: return organization;
case 30: return person;
case 40: return project;
case 50: return result;
default: return null;
}
}
public static com.google.protobuf.Internal.EnumLiteMap<Type>
internalGetValueMap() {
return internalValueMap;
}
private static com.google.protobuf.Internal.EnumLiteMap<Type>
internalValueMap =
new com.google.protobuf.Internal.EnumLiteMap<Type>() {
public Type findValueByNumber(int number) {
return Type.valueOf(number);
}
};
public final com.google.protobuf.Descriptors.EnumValueDescriptor
getValueDescriptor() {
return getDescriptor().getValues().get(index);
}
public final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptorForType() {
return getDescriptor();
}
public static final com.google.protobuf.Descriptors.EnumDescriptor
getDescriptor() {
return eu.dnetlib.data.proto.TypeProtos.getDescriptor().getEnumTypes().get(0);
}
private static final Type[] VALUES = {
datasource, organization, person, project, result,
};
public static Type valueOf(
com.google.protobuf.Descriptors.EnumValueDescriptor desc) {
if (desc.getType() != getDescriptor()) {
throw new java.lang.IllegalArgumentException(
"EnumValueDescriptor is not for this type.");
}
return VALUES[desc.getIndex()];
}
private final int index;
private final int value;
private Type(int index, int value) {
this.index = index;
this.value = value;
}
// @@protoc_insertion_point(enum_scope:eu.dnetlib.data.proto.Type)
}
public static com.google.protobuf.Descriptors.FileDescriptor
getDescriptor() {
return descriptor;
}
private static com.google.protobuf.Descriptors.FileDescriptor
descriptor;
static {
java.lang.String[] descriptorData = {
"\n\nType.proto\022\025eu.dnetlib.data.proto*M\n\004T" +
"ype\022\016\n\ndatasource\020\n\022\020\n\014organization\020\024\022\n\n" +
"\006person\020\036\022\013\n\007project\020(\022\n\n\006result\0202B#\n\025eu" +
".dnetlib.data.protoB\nTypeProtos"
};
com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner assigner =
new com.google.protobuf.Descriptors.FileDescriptor.InternalDescriptorAssigner() {
public com.google.protobuf.ExtensionRegistry assignDescriptors(
com.google.protobuf.Descriptors.FileDescriptor root) {
descriptor = root;
return null;
}
};
com.google.protobuf.Descriptors.FileDescriptor
.internalBuildGeneratedFileFrom(descriptorData,
new com.google.protobuf.Descriptors.FileDescriptor[] {
}, assigner);
}
// @@protoc_insertion_point(outer_class_scope)
}

View File

@ -0,0 +1,61 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "DatasourceProtos";
message Datasource {
optional Metadata metadata = 2;
message Metadata {
// common fields
optional Qualifier datasourcetype = 15;
optional Qualifier openairecompatibility = 17;
optional StringField officialname = 1;
optional StringField englishname = 2;
optional StringField websiteurl = 3;
optional StringField logourl = 4;
optional StringField contactemail = 5;
optional StringField namespaceprefix = 7;
optional StringField latitude = 18;
optional StringField longitude = 19;
optional StringField dateofvalidation = 20;
optional StringField description = 21;
repeated StructuredProperty subjects = 45;
// opendoar specific fields (od*)
optional StringField odnumberofitems = 9;
optional StringField odnumberofitemsdate = 10;
optional StringField odpolicies = 12;
repeated StringField odlanguages = 13;
repeated StringField odcontenttypes = 14;
repeated StringField accessinfopackage = 6;
// re3data fields
optional StringField releasestartdate = 31;
optional StringField releaseenddate = 32;
optional StringField missionstatementurl = 33;
optional BoolField dataprovider = 34;
optional BoolField serviceprovider = 35;
optional StringField databaseaccesstype = 36; // {open, restricted or closed}
optional StringField datauploadtype = 37; // {open, restricted or closed}
optional StringField databaseaccessrestriction = 38; // {feeRequired, registration, other}
optional StringField datauploadrestriction = 39; // {feeRequired, registration, other}
optional BoolField versioning = 40;
optional StringField citationguidelineurl = 41;
optional StringField qualitymanagementkind = 42; // {yes, no, uknown}
optional StringField pidsystems = 43;
optional StringField certificates = 44;
repeated KeyValue policies = 46;
}
}

View File

@ -0,0 +1,23 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "DatasourceOrganizationProtos";
message DatasourceOrganization {
optional Provision provision = 1;
message Provision {
enum RelName {
isProvidedBy = 1;
provides = 2;
}
required RelMetadata relMetadata = 1;
}
}

View File

@ -0,0 +1,16 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "DedupProtos";
message Dedup {
enum RelName {
isMergedIn = 1;
merges = 2;
}
required RelMetadata relMetadata = 1;
}

View File

@ -0,0 +1,16 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "DedupSimilarityProtos";
message DedupSimilarity {
enum RelName {
isSimilarTo = 1;
}
required RelMetadata relMetadata = 1;
}

View File

@ -0,0 +1,104 @@
package eu.dnetlib.data.proto;
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "FieldTypeProtos";
message StringField {
required string value = 1;
optional DataInfo dataInfo = 2;
}
message BoolField {
required bool value = 1;
optional DataInfo dataInfo = 2;
}
message IntField {
required int32 value = 1;
optional DataInfo dataInfo = 2;
}
message StructuredProperty {
required string value = 1;
optional Qualifier qualifier = 2;
optional DataInfo dataInfo = 3;
}
// Generic container for identified values, e.g:
// <oaf:hostedBy name="Publications at Bielefeld University" id="opendoar::2294"/>
// <oaf:collectedFrom name="Publications at Bielefeld University" id="opendoar::2294"/>
message KeyValue {
required string key = 1;
optional string value = 2;
optional DataInfo dataInfo = 3;
}
message Qualifier {
optional string classid = 1;
optional string classname = 2;
optional string schemeid = 3;
optional string schemename = 4;
optional DataInfo dataInfo = 5;
}
message DataInfo {
optional bool invisible = 6 [default = false];
optional bool inferred = 1;
optional bool deletedbyinference = 2;
optional string trust = 3;
optional string inferenceprovenance = 4;
required Qualifier provenanceaction = 5;
}
message OAIProvenance {
optional OriginDescription originDescription = 1;
message OriginDescription {
optional string harvestDate = 1;
optional bool altered = 2 [default = true];
optional string baseURL = 3;
optional string identifier = 4;
optional string datestamp = 5;
optional string metadataNamespace = 6;
optional OriginDescription originDescription = 7;
}
}
message ExtraInfo {
required string name = 1;
required string typology = 2;
required string provenance = 3;
optional string trust = 4;
// json containing a Citation or Statistics
optional string value = 5;
}
message Author {
required string fullname = 1;
optional string name = 2;
optional string surname = 3;
required int32 rank = 4;
repeated KeyValue pid = 5;
repeated StringField affiliation = 6;
}

View File

@ -0,0 +1,9 @@
package eu.dnetlib.data.proto;
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "KindProtos";
enum Kind {
entity = 1;
relation = 2;
}

View File

@ -0,0 +1,97 @@
package eu.dnetlib.data.proto;
import "Kind.proto";
import "FieldType.proto";
// for Oafentity
import "Type.proto";
import "Datasource.proto";
import "Organization.proto";
import "Project.proto";
import "Result.proto";
import "Person.proto";
// for OafRel
import "RelType.proto";
import "Datasource_Organization.proto";
import "Project_Organization.proto";
import "Result_Organization.proto";
import "Result_Project.proto";
import "Result_Result.proto";
import "Organization_Organization.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "OafProtos";
message Oaf {
required Kind kind = 1;
optional OafEntity entity = 2;
optional OafRel rel = 3;
optional DataInfo dataInfo = 4;
// used to mark the last update time of this object
optional sfixed64 lastupdatetimestamp = 5;
}
message OafEntity {
required Type type = 1;
required string id = 12;
repeated string originalId = 8;
repeated KeyValue collectedfrom = 9;
repeated StructuredProperty pid = 10;
optional string dateofcollection = 11;
optional string dateoftransformation = 13;
/* Any relation that we want to bundle together with this entity.
It's intended to be used only in temporary values in map/red jobs (sequence files, ...)
and never persisted values stored in HBase. */
repeated OafRel cachedRel = 2;
repeated Oaf cachedOafRel = 18;
optional Datasource datasource = 3;
optional Organization organization = 4;
optional Project project = 6;
optional Result result = 7;
optional Person person = 5;
repeated OafEntity children = 16;
repeated ExtraInfo extraInfo = 15;
optional OAIProvenance oaiprovenance = 17;
}
message OafRel {
required RelType relType = 1;
required SubRelType subRelType = 19;
required string relClass = 20; // one among the SubRel names, e.g. Provision.RelName.isProvidedBy
required string source = 2;
required string target = 3;
/* if true then is a "child" */
required bool child = 4;
optional OafEntity cachedTarget = 5;
// needed to have more information that is not included in cachedTarget.
optional Oaf cachedOafTarget = 8;
// Datamodel rels
optional DatasourceOrganization datasourceOrganization = 6;
optional ProjectOrganization projectOrganization = 9;
optional ResultOrganization resultOrganization = 10;
optional ResultProject resultProject = 11;
optional ResultResult resultResult = 16;
optional OrganizationOrganization organizationOrganization = 17;
repeated KeyValue collectedfrom = 21;
}

View File

@ -0,0 +1,33 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "OrganizationProtos";
message Organization {
optional Metadata metadata = 2;
message Metadata {
optional StringField legalshortname = 1;
optional StringField legalname = 2;
repeated StringField alternativeNames = 17;
optional StringField websiteurl = 3;
optional StringField logourl = 4;
optional StringField eclegalbody = 5;
optional StringField eclegalperson = 6;
optional StringField ecnonprofit = 7;
optional StringField ecresearchorganization = 8;
optional StringField echighereducation = 9;
optional StringField ecinternationalorganizationeurinterests = 10;
optional StringField ecinternationalorganization = 11;
optional StringField ecenterprise = 12;
optional StringField ecsmevalidated = 13;
optional StringField ecnutscode = 14;
optional Qualifier country = 16;
}
}

View File

@ -0,0 +1,15 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
import "Dedup.proto";
import "DedupSimilarity.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "OrganizationOrganizationProtos";
message OrganizationOrganization {
optional Dedup dedup = 1;
optional DedupSimilarity dedupSimilarity = 2;
}

View File

@ -0,0 +1,25 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "PersonProtos";
message Person {
optional Metadata metadata = 2;
message Metadata {
required string fullname = 1;
optional string firstname = 2;
optional string lastname = 3;
required string pubID = 4;
optional string pubDOI = 5;
optional string orcid = 6;
required int32 rank = 7;
repeated string coauthors = 8;
repeated double topics = 9;
required string area = 10;
}
}

View File

@ -0,0 +1,42 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ProjectProtos";
message Project {
optional Metadata metadata = 2;
message Metadata {
optional StringField websiteurl = 1;
optional StringField code = 2;
optional StringField acronym = 3;
optional StringField title = 4;
optional StringField startdate = 5;
optional StringField enddate = 6;
optional StringField callidentifier = 7;
optional StringField keywords = 8;
optional StringField duration = 9;
optional StringField ecsc39 = 10;
optional StringField oamandatepublications = 11;
optional StringField ecarticle29_3 = 12;
repeated StructuredProperty subjects = 14;
repeated StringField fundingtree = 15;
optional Qualifier contracttype = 13;
optional StringField optional1 = 16;
optional StringField optional2 = 17;
optional StringField jsonextrainfo = 18;
optional StringField contactfullname = 19;
optional StringField contactfax = 20;
optional StringField contactphone = 21;
optional StringField contactemail = 22;
}
}

View File

@ -0,0 +1,23 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ProjectOrganizationProtos";
message ProjectOrganization {
optional Participation participation = 1;
message Participation {
enum RelName {
isParticipant = 1;
hasParticipant = 2;
}
required RelMetadata relMetadata = 1;
optional string participantnumber = 2;
}
}

View File

@ -0,0 +1,15 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "RelMetadataProtos";
message RelMetadata {
optional Qualifier semantics = 1;
optional string startdate = 3;
optional string enddate = 4;
}

View File

@ -0,0 +1,36 @@
package eu.dnetlib.data.proto;
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "RelTypeProtos";
enum RelType {
// Datamodel rels
datasourceOrganization = 1;
projectOrganization = 4;
resultOrganization = 5;
resultProject = 6;
resultResult = 9;
organizationOrganization = 11;
}
enum SubRelType {
provision = 1; // datasourceOrganization
participation = 4; // projectOrganization
outcome = 6; // resultProject
similarity = 8; // resultResult
publicationDataset = 9; // resultResult
affiliation = 12; // resultOrganizaiton
dedup = 10; // resultResult | organizationOrganization
dedupSimilarity = 11; // resultResult | organizationOrganization
supplement = 13; // resultResult
part = 15; // resultResult
version = 16; // resultResult
relationship = 17; // catch all
}

View File

@ -0,0 +1,128 @@
package eu.dnetlib.data.proto;
import "FieldType.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ResultProtos";
message Result {
optional Metadata metadata = 2;
repeated Instance instance = 6;
repeated ExternalReference externalReference = 7;
message Metadata {
repeated Author author = 40;
// resulttype allows subclassing results into publications | datasets | software
optional Qualifier resulttype = 8;
// common fields
optional Qualifier language = 12;
repeated Qualifier country = 33;
repeated StructuredProperty subject = 1;
repeated StructuredProperty title = 2;
repeated StructuredProperty relevantdate = 3;
repeated StringField description = 25;
optional StringField dateofacceptance = 4;
optional StringField publisher = 5;
optional StringField embargoenddate = 6;
repeated StringField source = 27;
repeated StringField fulltext = 29; // remove candidate
repeated StringField format = 21;
repeated StringField contributor = 30;
optional Qualifier resourcetype = 19;
repeated StringField coverage = 43;
repeated Context context = 28;
// publication specific
optional Journal journal = 18;
// dataset specific
optional StringField storagedate = 9;
optional StringField device = 26;
optional StringField size = 20;
optional StringField version = 22;
optional StringField lastmetadataupdate = 23;
optional StringField metadataversionnumber = 24;
repeated GeoLocation geolocation = 44;
// software specific
repeated StringField documentationUrl = 35;
repeated StructuredProperty license = 36;
optional StringField codeRepositoryUrl = 38;
optional Qualifier programmingLanguage = 39;
// other research products specifics
repeated StringField contactperson = 45;
repeated StringField contactgroup = 41;
repeated StringField tool = 42;
}
message Journal {
optional string name = 1;
optional string issnPrinted = 2;
optional string issnOnline = 3;
optional string issnLinking = 4;
optional string ep = 6;
optional string iss = 7;
optional string sp = 8;
optional string vol = 9;
optional string edition = 10;
optional string conferenceplace = 11;
optional string conferencedate = 12;
optional DataInfo dataInfo = 5;
}
// <concept id="egi::vo::alice" />
message Context {
required string id = 1;
repeated DataInfo dataInfo = 2;
}
message Instance {
optional StringField license = 6;
optional Qualifier accessright = 3;
optional Qualifier instancetype = 4;
optional KeyValue hostedby = 5;
repeated string url = 9;
// other research products specific
optional string distributionlocation = 12;
optional KeyValue collectedfrom = 10;
optional StringField dateofacceptance = 11;
}
message ExternalReference {
optional string sitename = 1; // source
optional string label = 2; // title
optional string url = 3; // text()
optional string description = 4; // ?? not mapped yet ??
optional Qualifier qualifier = 5; // type
optional string refidentifier = 6; // site internal identifier
optional string query = 7; // maps the oaf:reference/@query attribute
optional DataInfo dataInfo = 8; // ExternalReferences might be also inferred
}
message GeoLocation {
optional string point = 1;
optional string box = 2;
optional string place = 3;
}
}

View File

@ -0,0 +1,23 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ResultOrganizationProtos";
message ResultOrganization {
optional Affiliation affiliation = 1;
message Affiliation {
enum RelName {
isAuthorInstitutionOf = 1; // Organization --> Result
hasAuthorInstitution = 2; // Result --> Organization
}
required RelMetadata relMetadata = 1;
}
}

View File

@ -0,0 +1,23 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ResultProjectProtos";
message ResultProject {
optional Outcome outcome = 1;
message Outcome {
enum RelName {
isProducedBy = 1;
produces = 2;
}
required RelMetadata relMetadata = 1;
}
}

View File

@ -0,0 +1,94 @@
package eu.dnetlib.data.proto;
import "RelMetadata.proto";
import "Dedup.proto";
import "DedupSimilarity.proto";
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "ResultResultProtos";
message ResultResult {
//choice of the possible subtypes
optional Similarity similarity = 2;
optional PublicationDataset publicationDataset = 3;
optional Dedup dedup = 4;
optional DedupSimilarity dedupSimilarity = 5;
optional Supplement supplement = 6;
optional Part part = 7;
optional Relationship relationship = 8;
optional SoftwareSoftware softwareSoftware = 9;
message Similarity {
enum RelName {
isAmongTopNSimilarDocuments = 1;
hasAmongTopNSimilarDocuments = 2;
}
required RelMetadata relMetadata = 1;
// level of similarity: coefficient from [0, 1] range,
// the greater the number, the more similar the documents
optional float similarity = 2;
enum Type {
STANDARD = 1;
WEBUSAGE = 2;
}
// similarity type
optional Type type = 3 [default = STANDARD];
}
message PublicationDataset {
enum RelName {
isRelatedTo = 1;
}
required RelMetadata relMetadata = 1;
}
message Supplement {
enum RelName {
isSupplementTo = 1;
isSupplementedBy = 2;
}
required RelMetadata relMetadata = 1;
}
message Part {
enum RelName {
isPartOf = 1;
hasPart = 2;
}
required RelMetadata relMetadata = 1;
}
message SoftwareSoftware {
enum RelName {
isVersionOf = 1;
}
required RelMetadata relMetadata = 1;
}
message Relationship {
enum RelName {
isRelatedTo = 1;
}
required RelMetadata relMetadata = 1;
}
}

View File

@ -0,0 +1,9 @@
package eu.dnetlib.data.proto;
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "SpecialTrustProtos";
enum SpecialTrust {
INFINITE = 1;
NEUTRAL = 2;
}

View File

@ -0,0 +1,13 @@
package eu.dnetlib.data.proto;
option java_package = "eu.dnetlib.data.proto";
option java_outer_classname = "TypeProtos";
// entity tags are used as rowkey prefixes on hbase
enum Type {
datasource = 10;
organization = 20;
person = 30;
project = 40;
result = 50;
}

View File

@ -64,11 +64,16 @@
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
</dependency>
</dependencies>

View File

@ -1,16 +1,15 @@
package eu.dnetlib.pace.clustering;
import java.util.*;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import com.google.common.collect.Sets;
import eu.dnetlib.pace.common.AbstractPaceFunctions;
import eu.dnetlib.pace.model.Field;
import org.apache.commons.lang.StringUtils;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public abstract class AbstractClusteringFunction extends AbstractPaceFunctions implements ClusteringFunction {
protected Map<String, Integer> params;
@ -26,7 +25,7 @@ public abstract class AbstractClusteringFunction extends AbstractPaceFunctions i
return fields.stream().filter(f -> !f.isEmpty())
.map(Field::stringValue)
.map(this::normalize)
.map(s -> filterStopWords(s, stopwords))
.map(s -> filterAllStopWords(s))
.map(this::doApply)
.map(c -> filterBlacklisted(c, ngramBlacklist))
.flatMap(c -> c.stream())

View File

@ -1,25 +1,17 @@
package eu.dnetlib.pace.clustering;
import java.io.Serializable;
import com.google.common.collect.Sets;
import eu.dnetlib.pace.common.AbstractPaceFunctions;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.Person;
import org.apache.commons.lang.StringUtils;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Set;
import eu.dnetlib.pace.model.FieldList;
import eu.dnetlib.pace.model.FieldValue;
import org.apache.commons.lang.StringUtils;
import com.google.common.base.Splitter;
import com.google.common.collect.Iterables;
import com.google.common.collect.Sets;
import eu.dnetlib.pace.common.AbstractPaceFunctions;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.gt.Author;
import eu.dnetlib.pace.model.gt.GTAuthor;
@ClusteringClass("personclustering")
@ClusteringClass("personClustering")
public class PersonClustering extends AbstractPaceFunctions implements ClusteringFunction {
private Map<String, Integer> params;
@ -36,14 +28,13 @@ public class PersonClustering extends AbstractPaceFunctions implements Clusterin
for (final Field f : fields) {
final GTAuthor gta = GTAuthor.fromOafJson(f.stringValue());
final Person person = new Person(f.stringValue(), false);
final Author a = gta.getAuthor();
if (a.isWellFormed()) {
hashes.add(firstLC(a.getFirstname()) + a.getSecondnames().toLowerCase());
if (StringUtils.isNotBlank(person.getNormalisedFirstName()) && StringUtils.isNotBlank(person.getNormalisedSurname())) {
hashes.add(firstLC(person.getNormalisedFirstName()) + person.getNormalisedSurname().toLowerCase());
} else {
for (final String token1 : tokens(a.getFullname())) {
for (final String token2 : tokens(a.getFullname())) {
for (final String token1 : tokens(f.stringValue(), MAX_TOKENS)) {
for (final String token2 : tokens(f.stringValue(), MAX_TOKENS)) {
if (!token1.equals(token2)) {
hashes.add(firstLC(token1) + token2);
}
@ -55,13 +46,31 @@ public class PersonClustering extends AbstractPaceFunctions implements Clusterin
return hashes;
}
private String firstLC(final String s) {
return StringUtils.substring(s, 0, 1).toLowerCase();
}
private Iterable<String> tokens(final String s) {
return Iterables.limit(Splitter.on(" ").omitEmptyStrings().trimResults().split(s), MAX_TOKENS);
}
// @Override
// public Collection<String> apply(final List<Field> fields) {
// final Set<String> hashes = Sets.newHashSet();
//
// for (final Field f : fields) {
//
// final GTAuthor gta = GTAuthor.fromOafJson(f.stringValue());
//
// final Author a = gta.getAuthor();
//
// if (StringUtils.isNotBlank(a.getFirstname()) && StringUtils.isNotBlank(a.getSecondnames())) {
// hashes.add(firstLC(a.getFirstname()) + a.getSecondnames().toLowerCase());
// } else {
// for (final String token1 : tokens(f.stringValue(), MAX_TOKENS)) {
// for (final String token2 : tokens(f.stringValue(), MAX_TOKENS)) {
// if (!token1.equals(token2)) {
// hashes.add(firstLC(token1) + token2);
// }
// }
// }
// }
// }
//
// return hashes;
// }
@Override
public Map<String, Integer> getParams() {

View File

@ -1,6 +1,7 @@
package eu.dnetlib.pace.common;
import com.google.common.base.Joiner;
import com.google.common.base.Splitter;
import com.google.common.collect.Iterables;
import com.google.common.collect.Lists;
import com.google.common.collect.Sets;
@ -26,7 +27,12 @@ import java.util.regex.Pattern;
*/
public abstract class AbstractPaceFunctions {
protected static Set<String> stopwords = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_en.txt");
protected static Set<String> stopwords_en = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_en.txt");
protected static Set<String> stopwords_de = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_de.txt");
protected static Set<String> stopwords_es = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_es.txt");
protected static Set<String> stopwords_fr = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_fr.txt");
protected static Set<String> stopwords_it = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_it.txt");
protected static Set<String> stopwords_pt = loadFromClasspath("/eu/dnetlib/pace/config/stopwords_pt.txt");
protected static Set<String> ngramBlacklist = loadFromClasspath("/eu/dnetlib/pace/config/ngram_blacklist.txt");
@ -41,8 +47,9 @@ public abstract class AbstractPaceFunctions {
}
protected String cleanup(final String s) {
final String s1 = nfd(s);
final String s2 = fixAliases(s1);
final String s0 = s.toLowerCase();
final String s1 = fixAliases(s0);
final String s2 = nfd(s1);
final String s3 = s2.replaceAll("&ndash;", " ");
final String s4 = s3.replaceAll("&amp;", " ");
final String s5 = s4.replaceAll("&quot;", " ");
@ -139,6 +146,18 @@ public abstract class AbstractPaceFunctions {
return sb.toString().trim();
}
protected String filterAllStopWords(String s) {
s = filterStopWords(s, stopwords_en);
s = filterStopWords(s, stopwords_de);
s = filterStopWords(s, stopwords_it);
s = filterStopWords(s, stopwords_fr);
s = filterStopWords(s, stopwords_pt);
s = filterStopWords(s, stopwords_es);
return s;
}
protected Collection<String> filterBlacklisted(final Collection<String> set, final Set<String> ngramBlacklist) {
final Set<String> newset = Sets.newLinkedHashSet();
for (final String s : set) {
@ -171,7 +190,7 @@ public abstract class AbstractPaceFunctions {
String[] line = s.split(";");
String value = line[0];
for (String key: line){
m.put(fixAliases(key),value);
m.put(fixAliases(key).toLowerCase(),value);
}
}
} catch (final Throwable e){
@ -191,17 +210,50 @@ public abstract class AbstractPaceFunctions {
return sb.toString().trim();
}
//TODO remove also codes of the cities
public String removeCodes(String s) {
final String regex = "\\bkey::[0-9]*\\b";
return s.replaceAll(regex, "").trim();
public String keywordsToCode(String s1, Map<String, String> translationMap, int windowSize){
List<String> tokens = Arrays.asList(s1.split(" "));
if (tokens.size()<windowSize)
windowSize = tokens.size();
int length = windowSize;
while (length != 0) {
for (int i = 0; i<=tokens.size()-length; i++){
String candidate = Joiner.on(" ").join(tokens.subList(i, i + length));
if (translationMap.containsKey(candidate)) {
s1 = (" " + s1 + " ").replaceAll(" " + candidate + " ", " " + translationMap.get(candidate) + " ");
}
}
length-=1;
}
return s1;
}
public String removeCodes(String s) {
final String regexKey = "\\bkey::[0-9]*\\b";
final String regexCity = "\\bcity::[0-9]*\\b";
return s.replaceAll(regexKey, "").replaceAll(regexCity, "").trim();
}
public double keywordsCompare(String s1, String s2){
List<String> keywords1 = getKeywords(s1);
List<String> keywords2 = getKeywords(s2);
int longer = (keywords1.size()>keywords2.size())?keywords1.size():keywords2.size();
if (getKeywords(s1).isEmpty() || getKeywords(s2).isEmpty())
return 1.0;
else
return (double)CollectionUtils.intersection(getKeywords(s1),getKeywords(s2)).size()/(double)longer;
}
//check if 2 strings have same keywords
public boolean sameKeywords(String s1, String s2){
//all keywords in common
//return getKeywords(s1).containsAll(getKeywords(s2)) && getKeywords(s2).containsAll(getKeywords(s1));
//at least 1 keyword in common
if (getKeywords(s1).isEmpty() || getKeywords(s2).isEmpty())
return true;
@ -209,11 +261,36 @@ public abstract class AbstractPaceFunctions {
return CollectionUtils.intersection(getKeywords(s1),getKeywords(s2)).size()>0;
}
//returns true if at least 1 city is in common
//returns true if a name has no cities
public boolean sameCity(String s1, String s2){
if (getCities(s1).isEmpty() || getCities(s2).isEmpty())
return true;
else
return CollectionUtils.intersection(getCities(s1), getCities(s2)).size()>0;
}
//get the list of keywords in a string
public List<String> getCities(String s) {
final String regex = "\\bcity::[0-9]*\\b";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(s);
List<String> codes = new ArrayList<>();
while (m.find()) {
codes.add(m.group(0));
for (int i = 1; i <= m.groupCount(); i++) {
codes.add(m.group(0));
}
}
return codes;
}
//get the list of keywords in a string
public List<String> getKeywords(String s) {
// final String regex = " \\d+ ";
final String regex = "\\bkey::[0-9]*\\b";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
@ -228,4 +305,13 @@ public abstract class AbstractPaceFunctions {
return codes;
}
protected String firstLC(final String s) {
return StringUtils.substring(s, 0, 1).toLowerCase();
}
protected Iterable<String> tokens(final String s, final int maxTokens) {
return Iterables.limit(Splitter.on(" ").omitEmptyStrings().trimResults().split(s), maxTokens);
}
}

View File

@ -1,20 +1,21 @@
package eu.dnetlib.pace.config;
import java.io.Serializable;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import eu.dnetlib.pace.condition.ConditionAlgo;
import eu.dnetlib.pace.model.ClusteringDef;
import eu.dnetlib.pace.model.CondDef;
import eu.dnetlib.pace.model.FieldDef;
import eu.dnetlib.pace.model.TreeNodeDef;
import eu.dnetlib.pace.util.PaceResolver;
import org.apache.commons.collections.CollectionUtils;
import org.codehaus.jackson.annotate.JsonIgnore;
import java.io.Serializable;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class PaceConfig implements Serializable {
private List<FieldDef> model;
@ -23,6 +24,8 @@ public class PaceConfig implements Serializable {
private List<ClusteringDef> clustering;
private Map<String, List<String>> blacklists;
private Map<String, TreeNodeDef> decisionTree;
private Map<String, FieldDef> modelMap;
public static PaceResolver paceResolver;
@ -58,6 +61,14 @@ public class PaceConfig implements Serializable {
return conditions;
}
public Map<String, TreeNodeDef> getDecisionTree() {
return decisionTree;
}
public void setDecisionTree(Map<String, TreeNodeDef> decisionTree) {
this.decisionTree = decisionTree;
}
@JsonIgnore
public List<ConditionAlgo> getConditionAlgos() {
return asConditionAlgos(getConditions());

View File

@ -5,7 +5,6 @@ import eu.dnetlib.pace.common.AbstractPaceFunctions;
import eu.dnetlib.pace.distance.DistanceClass;
import eu.dnetlib.pace.distance.SecondStringDistanceAlgo;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
@ -22,8 +21,13 @@ public class JaroWinklerNormalizedName extends SecondStringDistanceAlgo {
//key=word, value=global identifier => example: "università"->"university", used to substitute the word with the global identifier
private static Map<String,String> translationMap = AbstractPaceFunctions.loadMapFromClasspath("/eu/dnetlib/pace/config/translation_map.csv");
private static Map<String,String> cityMap = AbstractPaceFunctions.loadMapFromClasspath("/eu/dnetlib/pace/config/city_map.csv");
private Map<String, Number> params;
public JaroWinklerNormalizedName(Map<String, Number> params){
super(params, new com.wcohen.ss.JaroWinkler());
this.params = params;
}
public JaroWinklerNormalizedName(double weight) {
@ -43,13 +47,27 @@ public class JaroWinklerNormalizedName extends SecondStringDistanceAlgo {
cb = removeStopwords(cb);
//replace keywords with codes
ca = translate(ca, translationMap);
cb = translate(cb, translationMap);
String codesA = keywordsToCode(ca, translationMap, params.getOrDefault("windowSize", 4).intValue());
String codesB = keywordsToCode(cb, translationMap, params.getOrDefault("windowSize",4).intValue());
if (sameKeywords(ca,cb)) {
return normalize(ssalgo.score(removeCodes(ca), removeCodes(cb)));
//replace cities with codes
codesA = keywordsToCode(codesA, cityMap, params.getOrDefault("windowSize", 4).intValue());
codesB = keywordsToCode(codesB, cityMap, params.getOrDefault("windowSize", 4).intValue());
//if two names have same city
if (sameCity(codesA,codesB)){
if (keywordsCompare(codesA, codesB)>params.getOrDefault("threshold", 0.5).doubleValue()) {
ca = removeCodes(codesA);
cb = removeCodes(codesB);
if (ca.isEmpty() && cb.isEmpty())
return 1.0;
else
return normalize(ssalgo.score(ca,cb));
}
}
return 0.0;
}
@Override

View File

@ -0,0 +1,67 @@
package eu.dnetlib.pace.model;
import eu.dnetlib.pace.util.PaceException;
import org.codehaus.jackson.map.ObjectMapper;
import java.io.IOException;
import java.io.Serializable;
import java.util.Map;
public class FieldConf implements Serializable {
private String field; //name of the field on which apply the comparator
private String comparator; //comparator name
private double weight = 1.0; //weight for the field (to be used in the aggregation)
private Map<String,Number> params; //parameters
public FieldConf() {
}
public FieldConf(String field, String comparator, double weight, Map<String, Number> params) {
this.field = field;
this.comparator = comparator;
this.weight = weight;
this.params = params;
}
public String getField() {
return field;
}
public void setField(String field) {
this.field = field;
}
public String getComparator() {
return comparator;
}
public void setComparator(String comparator) {
this.comparator = comparator;
}
public double getWeight() {
return weight;
}
public void setWeight(double weight) {
this.weight = weight;
}
public Map<String, Number> getParams() {
return params;
}
public void setParams(Map<String, Number> params) {
this.params = params;
}
@Override
public String toString() {
try {
return new ObjectMapper().writeValueAsString(this);
} catch (IOException e) {
throw new PaceException("Impossible to convert to JSON: ", e);
}
}
}

View File

@ -1,19 +1,16 @@
package eu.dnetlib.pace.model;
import java.io.Serializable;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import com.google.common.base.Splitter;
import com.google.common.collect.Lists;
import com.google.gson.Gson;
import eu.dnetlib.pace.config.PaceConfig;
import eu.dnetlib.pace.config.Type;
import eu.dnetlib.pace.distance.*;
import eu.dnetlib.pace.distance.algo.*;
import eu.dnetlib.pace.util.PaceException;
import eu.dnetlib.pace.distance.DistanceAlgo;
import java.io.Serializable;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* The schema is composed by field definitions (FieldDef). Each field has a type, a name, and an associated distance algorithm.

View File

@ -14,4 +14,11 @@ public interface FieldList extends List<Field>, Field {
*/
public List<String> stringList();
/**
* Double[] Array
*
* @return the double[] array
*/
public double[] doubleArray();
}

View File

@ -1,21 +1,18 @@
package eu.dnetlib.pace.model;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;
import com.google.common.base.Function;
import com.google.common.base.Joiner;
import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import com.google.common.collect.Lists;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import eu.dnetlib.pace.config.Type;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;
/**
* The Class FieldListImpl.
*/
@ -319,6 +316,21 @@ public class FieldListImpl extends AbstractField implements FieldList {
};
}
@Override
public double[] doubleArray() {
return Lists.newArrayList(Iterables.transform(fields, getDouble())).stream().mapToDouble(d-> d).toArray();
}
private Function<Field,Double> getDouble() {
return new Function<Field, Double>() {
@Override
public Double apply(final Field f) {
return Double.parseDouble(f.stringValue());
}
};
}
@Override
public String toString() {
return stringList().toString();

View File

@ -0,0 +1,145 @@
package eu.dnetlib.pace.model;
import eu.dnetlib.pace.config.PaceConfig;
import eu.dnetlib.pace.tree.Comparator;
import eu.dnetlib.pace.tree.support.AggType;
import eu.dnetlib.pace.util.PaceException;
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
import org.codehaus.jackson.map.ObjectMapper;
import java.io.IOException;
import java.io.Serializable;
import java.util.List;
public class TreeNodeDef implements Serializable {
private List<FieldConf> fields; //list of fields involved in the tree node (contains comparators to be used and field on which apply the comparator)
private AggType aggregation; //how to aggregate similarity measures for every field
private double threshold; //threshold on the similarity measure
private String positive; //specifies the next node in case of positive result: similarity>=th
private String negative; //specifies the next node in case of negative result: similarity<th
private String undefined; //specifies the next node in case of undefined result: similarity=-1
boolean ignoreMissing = true; //specifies what to do in case of missing field
public TreeNodeDef() {
}
//compute the similarity measure between two documents
public double evaluate(MapDocument doc1, MapDocument doc2) {
DescriptiveStatistics stats = new DescriptiveStatistics();
for (FieldConf fieldConf : fields) {
double weight = fieldConf.getWeight();
double similarity = comparator(fieldConf).compare(doc1.getFieldMap().get(fieldConf.getField()), doc2.getFieldMap().get(fieldConf.getField()));
//if similarity is -1 means that a comparator gave undefined, do not add result to the stats
if (similarity != -1) {
stats.addValue(weight * similarity);
}
else {
if (!ignoreMissing) //if the missing value has not to be ignored, return -1
return -1;
}
}
switch (aggregation){
case AVG:
return stats.getMean();
case SUM:
return stats.getSum();
case MAX:
return stats.getMax();
case MIN:
return stats.getMin();
default:
return 0.0;
}
}
private Comparator comparator(final FieldConf field){
return PaceConfig.paceResolver.getComparator(field.getComparator(), field.getParams());
}
public TreeNodeDef(List<FieldConf> fields, double threshold, AggType aggregation, String positive, String negative, String undefined) {
this.fields = fields;
this.threshold = threshold;
this.aggregation = aggregation;
this.positive = positive;
this.negative = negative;
this.undefined = undefined;
}
public boolean isIgnoreMissing() {
return ignoreMissing;
}
public void setIgnoreMissing(boolean ignoreMissing) {
this.ignoreMissing = ignoreMissing;
}
public List<FieldConf> getFields() {
return fields;
}
public void setFields(List<FieldConf> fields) {
this.fields = fields;
}
public double getThreshold() {
return threshold;
}
public void setThreshold(double threshold) {
this.threshold = threshold;
}
public AggType getAggregation() {
return aggregation;
}
public void setAggregation(AggType aggregation) {
this.aggregation = aggregation;
}
public String getPositive() {
return positive;
}
public void setPositive(String positive) {
this.positive = positive;
}
public String getNegative() {
return negative;
}
public void setNegative(String negative) {
this.negative = negative;
}
public String getUndefined() {
return undefined;
}
public void setUndefined(String undefined) {
this.undefined = undefined;
}
@Override
public String toString() {
try {
return new ObjectMapper().writeValueAsString(this);
} catch (IOException e) {
throw new PaceException("Impossible to convert to JSON: ", e);
}
}
}

View File

@ -0,0 +1,33 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import org.apache.commons.lang.StringUtils;
import java.util.Map;
abstract class AbstractComparator implements Comparator {
Map<String, Number> params;
public AbstractComparator(Map<String, Number> params){
this.params = params;
}
@Override
public double compare(Field a, Field b) {
return 0.0;
}
public static double stringSimilarity(String s1, String s2) {
String longer = s1, shorter = s2;
if (s1.length() < s2.length()) { // longer should always have greater length
longer = s2; shorter = s1;
}
int longerLength = longer.length();
if (longerLength == 0) //if strings have 0 length return 0 (no similarity)
return 0.0;
return (longerLength - StringUtils.getLevenshteinDistance(longer, shorter)) / (double) longerLength;
}
}

View File

@ -0,0 +1,42 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.FieldList;
import java.util.List;
import java.util.Map;
@ComparatorClass("coauthorsMatch")
public class CoauthorsMatch extends AbstractComparator {
public CoauthorsMatch(Map<String, Number> params) {
super(params);
}
@Override
public double compare(Field a, Field b) {
final List<String> c1 = ((FieldList) a).stringList();
final List<String> c2 = ((FieldList) b).stringList();
int size1 = c1.size();
int size2 = c2.size();
//few coauthors or too many coauthors
if (size1 < params.getOrDefault("minCoauthors", 5).intValue() || size2 < params.getOrDefault("minCoauthors", 5).intValue() || (size1+size2 > params.getOrDefault("maxCoauthors", 200).intValue()))
return -1;
int coauthorship = 0;
for (String ca1: c1){
for (String ca2: c2){
if (stringSimilarity(ca1.replaceAll("\\.","").replaceAll(" ",""), ca2.replaceAll("\\.","").replaceAll(" ",""))>= params.getOrDefault("simTh", 0.7).doubleValue())
coauthorship++;
}
}
return coauthorship;
}
}

View File

@ -0,0 +1,10 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
public interface Comparator {
//compare two fields and returns: the distace measure, -1 if undefined
public double compare(Field a, Field b);
}

View File

@ -0,0 +1,14 @@
package eu.dnetlib.pace.tree;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
public @interface ComparatorClass {
public String value();
}

View File

@ -0,0 +1,25 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import java.util.Map;
@ComparatorClass("exactMatch")
public class ExactMatch extends AbstractComparator {
public ExactMatch(Map<String, Number> params) {
super(params);
}
@Override
public double compare(Field a, Field b) {
if (a.stringValue().isEmpty() || b.stringValue().isEmpty())
return -1;
else if (a.stringValue().equals(b.stringValue()))
return 1;
else
return 0;
}
}

View File

@ -0,0 +1,31 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import java.util.Map;
@ComparatorClass("similar")
public class SimilarMatch extends AbstractComparator {
public SimilarMatch(Map<String, Number> params) {
super(params);
}
@Override
public double compare(Field a, Field b) {
if (a.stringValue().isEmpty() || b.stringValue().isEmpty())
return -1; //undefined if one name is missing
//take only the first name
String firstname1 = a.stringValue().split(" ")[0];
String firstname2 = b.stringValue().split(" ")[0];
if (firstname1.toLowerCase().trim().replaceAll("\\.","").replaceAll("\\s","").length()<=2 || firstname2.toLowerCase().replaceAll("\\.", "").replaceAll("\\s","").length()<=2) //too short names (considered similar)
return 1;
return stringSimilarity(firstname1,firstname2);
}
}

View File

@ -0,0 +1,36 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.FieldListImpl;
import java.util.Map;
@ComparatorClass("topicsMatch")
public class TopicsMatch extends AbstractComparator {
public TopicsMatch(Map<String, Number> params) {
super(params);
}
@Override
public double compare(Field a, Field b) {
double[] t1 = ((FieldListImpl) a).doubleArray();
double[] t2 = ((FieldListImpl) b).doubleArray();
if (t1 == null || t2 == null)
return -1; //0 similarity if no topics in one of the authors or in both
double area = 0.0;
double min_value[] = new double[t1.length];
for(int i=0; i<t1.length; i++){
min_value[i] = (t1[i]<t2[i])?t1[i]:t2[i];
area += min_value[i];
}
return area;
}
}

View File

@ -0,0 +1,22 @@
package eu.dnetlib.pace.tree;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.FieldList;
import java.util.List;
import java.util.Map;
@ComparatorClass("undefined")
public class UndefinedNode implements Comparator {
Map<String, Number> params;
@Override
public double compare(Field a, Field b) {
final List<String> sa = ((FieldList) a).stringList();
final List<String> sb = ((FieldList) b).stringList();
return 0;
}
}

View File

@ -0,0 +1,21 @@
package eu.dnetlib.pace.tree.support;
import eu.dnetlib.pace.util.PaceException;
public enum AggType {
AVG,
SUM,
MAX,
MIN;
public static AggType getEnum(String value) {
try {
return AggType.valueOf(value);
}
catch (IllegalArgumentException e) {
throw new PaceException("Undefined aggregation type", e);
}
}
}

View File

@ -0,0 +1,20 @@
package eu.dnetlib.pace.tree.support;
public enum MatchType {
ORCID_MATCH,
COAUTHORS_MATCH,
TOPICS_MATCH,
NO_MATCH,
UNDEFINED;
public static MatchType getEnum(String value) {
try {
return MatchType.valueOf(value);
}
catch (IllegalArgumentException e) {
return MatchType.UNDEFINED;
}
}
}

View File

@ -41,9 +41,10 @@ public class BlockProcessor {
final Queue<MapDocument> q = prepare(documents);
if (q.size() > 1) {
log.debug("reducing key: '" + key + "' records: " + q.size());
// log.info("reducing key: '" + key + "' records: " + q.size());
//process(q, context);
process(simplifyQueue(q, key, context), context);
} else {
context.incrementCounter(dedupConf.getWf().getEntityType(), "records per hash key = 1", 1);
}
@ -109,7 +110,7 @@ public class BlockProcessor {
q.addAll(tempResults);
} else {
context.incrementCounter(wf.getEntityType(), String.format("Skipped records for count(%s) >= %s", wf.getOrderField(), wf.getGroupMaxSize()), tempResults.size());
log.debug("Skipped field: " + fieldRef + " - size: " + tempResults.size() + " - ngram: " + ngram);
// log.info("Skipped field: " + fieldRef + " - size: " + tempResults.size() + " - ngram: " + ngram);
}
}
@ -149,8 +150,8 @@ public class BlockProcessor {
if (!idCurr.equals(idPivot) && (fieldCurr != null)) {
final ScoreResult sr = algo.between(pivot, curr, dedupConf);
log.debug(sr.toString()+"SCORE "+ sr.getScore());
final ScoreResult sr = similarity(algo, pivot, curr);
// log.info(sr.toString()+"SCORE "+ sr.getScore());
emitOutput(sr, idPivot, idCurr, context);
i++;
}
@ -171,6 +172,15 @@ public class BlockProcessor {
}
}
private ScoreResult similarity(final PaceDocumentDistance algo, final MapDocument a, final MapDocument b) {
try {
return algo.between(a, b, dedupConf);
} catch(Throwable e) {
log.error(String.format("\nA: %s\n----------------------\nB: %s", a, b), e);
throw new IllegalArgumentException(e);
}
}
private boolean mustSkip(final String idPivot) {
return dedupConf.getWf().getSkipList().contains(getNsPrefix(idPivot));
}

View File

@ -2,11 +2,11 @@ package eu.dnetlib.pace.util;
public class PaceException extends RuntimeException {
public PaceException(String s, Throwable e) {
public PaceException(String s, Throwable e){
super(s, e);
}
public PaceException(String s) {
public PaceException(String s){
super(s);
}

View File

@ -7,6 +7,8 @@ import eu.dnetlib.pace.condition.ConditionClass;
import eu.dnetlib.pace.distance.DistanceAlgo;
import eu.dnetlib.pace.distance.DistanceClass;
import eu.dnetlib.pace.model.FieldDef;
import eu.dnetlib.pace.tree.Comparator;
import eu.dnetlib.pace.tree.ComparatorClass;
import org.reflections.Reflections;
import java.io.Serializable;
@ -20,6 +22,7 @@ public class PaceResolver implements Serializable {
private final Map<String, Class<ClusteringFunction>> clusteringFunctions;
private final Map<String, Class<ConditionAlgo>> conditionAlgos;
private final Map<String, Class<DistanceAlgo>> distanceAlgos;
private final Map<String, Class<Comparator>> comparators;
public PaceResolver() {
@ -34,13 +37,17 @@ public class PaceResolver implements Serializable {
this.distanceAlgos = new Reflections("eu.dnetlib").getTypesAnnotatedWith(DistanceClass.class).stream()
.filter(DistanceAlgo.class::isAssignableFrom)
.collect(Collectors.toMap(cl -> cl.getAnnotation(DistanceClass.class).value(), cl -> (Class<DistanceAlgo>)cl));
this.comparators = new Reflections("eu.dnetlib").getTypesAnnotatedWith(ComparatorClass.class).stream()
.filter(Comparator.class::isAssignableFrom)
.collect(Collectors.toMap(cl -> cl.getAnnotation(ComparatorClass.class).value(), cl -> (Class<Comparator>) cl));
}
public ClusteringFunction getClusteringFunction(String name, Map<String, Integer> params) throws PaceException {
try {
return clusteringFunctions.get(name).getDeclaredConstructor(Map.class).newInstance(params);
} catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
throw new PaceException(name + "not found", e);
throw new PaceException(name + " not found ", e);
}
}
@ -48,7 +55,7 @@ public class PaceResolver implements Serializable {
try {
return distanceAlgos.get(name).getDeclaredConstructor(Map.class).newInstance(params);
} catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
throw new PaceException(name + "not found", e);
throw new PaceException(name + " not found ", e);
}
}
@ -56,7 +63,15 @@ public class PaceResolver implements Serializable {
try {
return conditionAlgos.get(name).getDeclaredConstructor(String.class, List.class).newInstance(name, fields);
} catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
throw new PaceException(name + "not found", e);
throw new PaceException(name + " not found ", e);
}
}
public Comparator getComparator(String name, Map<String, Number> params) throws PaceException {
try {
return comparators.get(name).getDeclaredConstructor(Map.class).newInstance(params);
} catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException | NullPointerException e) {
throw new PaceException(name + " not found ", e);
}
}

File diff suppressed because it is too large Load Diff

View File

@ -611,7 +611,6 @@ terzo
th
ti
titolo
torino
tra
tranne
tre

View File

@ -1,11 +1,11 @@
key::1;university;università;universitario;universitaria;université;universitaire;universitaires;universidad;universitade;Universität;Uniwersytet;университет;universiteit;πανεπιστήμιο
key::1;university;università;università studi;universitario;universitaria;université;universitaire;universitaires;universidad;universitade;Universität;Uniwersytet;университет;universiteit;πανεπιστήμιο
key::2;studies;studi;études;estudios;estudos;Studien;studia;исследования;studies;σπουδές
key::3;advanced;superiore;supérieur;supérieure;supérieurs;supérieures;avancado;avancados;fortgeschrittene;fortgeschritten;zaawansowany;передовой;gevorderd;gevorderde;προχωρημένος;προχωρημένη;προχωρημένο;προχωρημένες;προχωρημένα
key::4;institute;istituto;institut;instituto;instituto;Institut;instytut;институт;instituut;ινστιτούτο
key::5;hospital;ospedale;hôpital;hospital;hospital;Krankenhaus;szpital;больница;ziekenhuis;νοσοκομείο
key::6;research;ricerca;recherche;investigacion;pesquisa;Forschung;badania;исследования;onderzoek;έρευνα
key::6;research;ricerca;recherche;investigacion;pesquisa;Forschung;badania;исследования;onderzoek;έρευνα;erevna;erevnas
key::7;college;collegio;université;colegio;faculdade;Hochschule;Szkoła Wyższa;Высшая школа;universiteit;κολλέγιο
key::8;foundation;fondazione;fondation;fundación;fundação;Stiftung;Fundacja;фонд;stichting;ίδρυμα
key::8;foundation;fondazione;fondation;fundación;fundação;Stiftung;Fundacja;фонд;stichting;ίδρυμα;idryma
key::9;center;centro;centre;centro;centro;zentrum;centrum;центр;centrum;κέντρο
key::10;national;nazionale;national;nationale;nationaux;nationales;nacional;nacional;national;krajowy;национальный;nationaal;nationale;εθνικό
key::11;association;associazione;association;asociación;associação;Verein;verband;stowarzyszenie;ассоциация;associatie
@ -44,4 +44,60 @@ key::43;initiative;iniziativa;initiative;инициатива;initiatief;πρω
key::44;academic;accademico;académique;universitaire;акадеческий academisch;ακαδημαϊκός;ακαδημαϊκή;ακαδημαϊκό;ακαδημαϊκές;ακαδημαϊκοί
key::45;institution;istituzione;institution;институциональный;instelling;ινστιτούτο
key::46;division;divisione;division;отделение;divisie;τμήμα
key::47;committee;comitato;comité;комитет;commissie;επιτροπή
key::47;committee;comitato;comité;комитет;commissie;επιτροπή
key::48;promotion;promozione;продвижение;proothisis;forderung
key::49;medical;medicine;clinical;medicina;clinici;médico;medicina;clínica;médico;medicina;clínica;medizinisch;Medizin;klinisch;medisch;geneeskunde;klinisch;ιατρικός;ιατρική;ιατρικό;ιατρικά;κλινικός;κλινική;κλινικό;κλινικά;tıbbi;tıp;klinik;orvosi;orvostudomány;klinikai;zdravniški;medicinski;klinični;meditsiini;kliinik;kliiniline;
key::50;technology;technological;tecnologia;tecnologie;tecnología;tecnológico;tecnologia;tecnológico;Technologie;technologisch;technologie;technologisch;τεχνολογία;τεχνολογικός;τεχνολογική;τεχνολογικό;teknoloji;teknolojik;technológia;technológiai;tehnologija;tehnološki;tehnoloogia;tehnoloogiline;
key::51;science;scientific;scienza;scientifiche;scienze;ciencia;científico;ciência;científico;Wissenschaft;wissenschaftlich;wetenschap;wetenschappelijk;επιστήμη;επιστημονικός;επιστημονική;επιστημονικό;επιστημονικά;bilim;bilimsel;tudomány;tudományos;znanost;znanstveni;teadus;teaduslik;
key::52;engineering;ingegneria;ingeniería;engenharia;Ingenieurwissenschaft;ingenieurswetenschappen;bouwkunde;μηχανικός;μηχανική;μηχανικό;mühendislik;mérnöki;Inženirstvo;inseneeria;inseneri;
key::53;management;gestione;gestionale;gestionali;gestión;administración;gestão;administração;Verwaltung;management;διαχείριση;yönetim;menedzsment;vodstvo;upravljanje;management;juhtkond;juhtimine;haldus;
key::54;energy;energia;energía;energia;Energie;energie;ενέργεια;enerji;energia;energija;energia;
key::55;agricultural;agriculture;agricoltura;agricole;agrícola;agricultura;agrícola;agricultura;landwirtschaftlich;Landwirtschaft;landbouwkundig;landbouw;αγροτικός;αγροτική;αγροτικό;γεωργικός;γεωργική;γεωργικό;γεωργία;tarımsal;tarım;mezőgazdasági;mezőgazdaság;poljedelski;poljedelstvo;põllumajandus;põllumajanduslik;
key::56;information;informazione;información;informação;Information;informatie;πληροφορία;bilgi;információ;informacija;informatsioon;
key::57;social;sociali;social;social;Sozial;sociaal;maatschappelijk;κοινωνικός;κοινωνική;κοινωνικό;κοινωνικά;sosyal;szociális;družbeni;sotsiaal;sotsiaalne;
key::58;environmental;ambiente;medioambiental;ambiente;medioambiente;meioambiente;Umwelt;milieu;milieuwetenschap;milieukunde;περιβαλλοντικός;περιβαλλοντική;περιβαλλοντικό;περιβαλλοντικά;çevre;környezeti;okoliški;keskonna;;
key::59;business;economia;economiche;economica;negocio;empresa;negócio;Unternehmen;bedrijf;bedrijfskunde;επιχείρηση;iş;üzleti;posel;ettevõte/äri;
key::60;pharmaceuticals;pharmacy;farmacia;farmaceutica;farmacéutica;farmacia;farmacêutica;farmácia;Pharmazeutika;Arzneimittelkunde;farmaceutica;geneesmiddelen;apotheek;φαρμακευτικός;φαρμακευτική;φαρμακευτικό;φαρμακευτικά;φαρμακείο;ilaç;eczane;gyógyszerészeti;gyógyszertár;farmacevtika;lekarništvo;farmaatsia;farmatseutiline;
key::61;healthcare;salute;atenciónmédica;cuidadodelasalud;cuidadoscomasaúde;Gesundheitswesen;gezondheidszorg;ιατροφαρμακευτικήπερίθαλψη;sağlıkhizmeti;egészségügy;zdravstvo;tervishoid;tervishoiu;
key::62;history;storia;historia;história;Geschichte;geschiedenis;geschiedkunde;ιστορία;tarih;történelem;zgodovina;ajalugu;
key::63;materials;materiali;materia;materiales;materiais;materialen;υλικά;τεκμήρια;malzemeler;anyagok;materiali;materjalid;vahendid;
key::64;economics;economia;economiche;economica;economía;economia;Wirtschaft;economie;οικονομικά;οικονομικέςεπιστήμες;ekonomi;közgazdaságtan;gospodarstvo;ekonomija;majanduslik;majandus;
key::65;therapeutics;terapeutica;terapéutica;terapêutica;therapie;θεραπευτική;tedavibilimi;gyógykezelés;terapevtika;terapeutiline;ravi;
key::66;oncology;oncologia;oncologico;oncología;oncologia;Onkologie;oncologie;ογκολογία;onkoloji;onkológia;onkologija;onkoloogia;
key::67;natural;naturali;naturale;natural;natural;natürlich;natuurlijk;φυσικός;φυσική;φυσικό;φυσικά;doğal;természetes;naraven;loodus;
key::68;educational;educazione;pedagogia;educacional;educativo;educacional;pädagogisch;educatief;εκπαιδευτικός;εκπαιδευτική;εκπαιδευτικό;εκπαιδευτικά;eğitimsel;oktatási;izobraževalen;haridus;hariduslik;
key::69;biomedical;biomedica;biomédico;biomédico;biomedizinisch;biomedisch;βιοιατρικός;βιοιατρική;βιοιατρικό;βιοιατρικά;biyomedikal;orvosbiológiai;biomedicinski;biomeditsiiniline;
key::70;veterinary;veterinaria;veterinarie;veterinaria;veterinária;tierärtzlich;veterinair;veeartsenijlkunde;κτηνιατρικός;κτηνιατρική;κτηνιατρικό;κτηνιατρικά;veteriner;állatorvosi;veterinar;veterinarski;veterinaaria;
key::71;chemistry;chimica;química;química;Chemie;chemie;scheikunde;χημεία;kimya;kémia;kemija;keemia;
key::72;security;sicurezza;seguridad;segurança;Sicherheit;veiligheid;ασφάλεια;güvenlik;biztonsági;varnost;turvalisus;julgeolek;
key::73;biotechnology;biotecnologia;biotecnologie;biotecnología;biotecnologia;Biotechnologie;biotechnologie;βιοτεχνολογία;biyoteknoloji;biotechnológia;biotehnologija;biotehnoloogia;
key::74;military;militare;militari;militar;militar;Militär;militair;leger;στρατιωτικός;στρατιωτική;στρατιωτικό;στρατιωτικά;askeri;katonai;vojaški;vojni;militaar;
key::75;theological;teologia;teologico;teológico;tecnológica;theologisch;theologisch;θεολογικός;θεολογική;θεολογικό;θεολογικά;teolojik;technológiai;teološki;teoloogia;usuteadus;teoloogiline;
key::76;electronics;elettronica;electrónica;eletrônicos;Elektronik;elektronica;ηλεκτρονική;elektronik;elektronika;elektronika;elektroonika;
key::77;forestry;forestale;forestali;silvicultura;forestal;floresta;Forstwirtschaft;bosbouw;δασοκομία;δασολογία;ormancılık;erdészet;gozdarstvo;metsandus;
key::78;maritime;marittima;marittime;marittimo;marítimo;marítimo;maritiem;ναυτικός;ναυτική;ναυτικό;ναυτικά;ναυτιλιακός;ναυτιλιακή;ναυτιλιακό;ναυτιλιακά;θαλάσσιος;θαλάσσια;θαλάσσιο;denizcilik;tengeri;morski;mere;merendus;
key::79;sports;sport;deportes;esportes;Sport;sport;sportwetenschappen;άθληση;γυμναστικήδραστηριότητα;spor;sport;šport;sport;spordi;
key::80;surgery;chirurgia;chirurgiche;cirugía;cirurgia;Chirurgie;chirurgie;heelkunde;εγχείρηση;επέμβαση;χειρουργικήεπέμβαση;cerrahi;sebészet;kirurgija;kirurgia;
key::81;cultural;culturale;culturali;cultura;cultural;cultural;kulturell;cultureel;πολιτιστικός;πολιτιστική;πολιτιστικό;πολιτισμικός;πολιτισμική;πολιτισμικό;kültürel;kultúrális;kulturni;kultuuri;kultuuriline;
key::82;computerscience;informatica;ordenador;computadora;informática;computación;cienciasdelacomputación;ciênciadacomputação;Computer;computer;υπολογιστής;ηλεκτρονικόςυπολογιστής;bilgisayar;számítógép;računalnik;arvuti;
key::83;finance;financial;finanza;finanziarie;finanza;financiero;finanças;financeiro;Finanzen;finanziell;financiën;financieel;χρηματοοικονομικά;χρηματοδότηση;finanse;finansal;pénzügy;pénzügyi;finance;finančni;finants;finantsiline;
key::84;communication;comunicazione;comuniciación;comunicação;Kommunikation;communication;επικοινωνία;iletişim;kommunikáció;komuniciranje;kommunikatsioon;
key::85;justice;giustizia;justicia;justiça;Recht;Justiz;justitie;gerechtigheid;δικαιοσύνη;υπουργείοδικαιοσύνης;δίκαιο;adalet;igazságügy;pravo;õigus;
key::86;aerospace;aerospaziale;aerospaziali;aeroespacio;aeroespaço;Luftfahrt;luchtvaart;ruimtevaart;αεροπορικός;αεροπορική;αεροπορικό;αεροναυπηγικός;αεροναυπηγική;αεροναυπηγικό;αεροναυπηγικά;havacılıkveuzay;légtér;zrakoplovstvo;atmosfäär;kosmos;
key::87;dermatology;dermatologia;dermatología;dermatologia;Dermatologie;dermatologie;δρματολογία;dermatoloji;bőrgyógyászat;dermatológia;dermatologija;dermatoloogia;
key::88;architecture;architettura;arquitectura;arquitetura;Architektur;architectuur;αρχιτεκτονική;mimarlık;építészet;arhitektura;arhitektuur;
key::89;mathematics;matematica;matematiche;matemáticas;matemáticas;Mathematik;wiskunde;mathematica;μαθηματικά;matematik;matematika;matematika;matemaatika;
key::90;language;lingue;linguistica;linguistiche;lenguaje;idioma;língua;idioma;Sprache;taal;taalkunde;γλώσσα;dil;nyelv;jezik;keel;
key::91;neuroscience;neuroscienza;neurociencia;neurociência;Neurowissenschaft;neurowetenschappen;νευροεπιστήμη;nörobilim;idegtudomány;nevroznanost;neuroteadused;
key::92;automation;automazione;automatización;automação;Automatisierung;automatisering;αυτοματοποίηση;otomasyon;automatizálás;avtomatizacija;automatiseeritud;
key::93;pediatric;pediatria;pediatriche;pediatrico;pediátrico;pediatría;pediátrico;pediatria;pädiatrisch;pediatrische;παιδιατρική;pediatrik;gyermekgyógyászat;pediatrija;pediaatria;
key::94;photonics;fotonica;fotoniche;fotónica;fotônica;Photonik;fotonica;φωτονική;fotonik;fotonika;fotonika;fotoonika;
key::95;mechanics;meccanica;meccaniche;mecánica;mecânica;Mechanik;Maschinenbau;mechanica;werktuigkunde;μηχανικής;mekanik;gépészet;mehanika;mehaanika;
key::96;psychiatrics;psichiatria;psichiatrica;psichiatriche;psiquiatría;psiquiatria;Psychiatrie;psychiatrie;ψυχιατρική;psikiyatrik;pszihiátria;psihiatrija;psühhaatria;
key::97;psychology;fisiologia;psicología;psicologia;Psychologie;psychologie;ψυχολογία;psikoloji;pszihológia;psihologija;psühholoogia;
key::98;automotive;industriaautomobilistica;industriadelautomóvil;automotriz;industriaautomotriz;automotivo;Automobilindustrie;autoindustrie;αυτοκίνητος;αυτοκίνητη;αυτοκίνητο;αυτοκινούμενος;αυτοκινούμενη;αυτοκινούμενο;αυτοκινητιστικός;αυτοκινητιστική;αυτοκινητιστικό;otomotiv;autóipari;samogiben;avtomobilskaindustrija;auto-;
key::99;neurology;neurologia;neurologiche;neurología;neurologia;Neurologie;neurologie;zenuwleer;νευρολογία;nöroloji;neurológia;ideggyógyászat;nevrologija;neuroloogia;
key::100;geology;geologia;geologiche;geología;geologia;Geologie;geologie;aardkunde;γεωλογία;jeoloji;geológia;földtudomány;geologija;geoloogia;
key::101;microbiology;microbiologia;micro-biologia;microbiologiche;microbiología;microbiologia;Mikrobiologie;microbiologie;μικροβιολογία;mikrobiyoloji;mikrobiológia;mikrobiologija;mikrobioloogia;
key::102;informatics;informatica;informática;informática;informatica;
key:103;forschungsgemeinschaft;comunita ricerca;research community;research foundation;research association

1 key::1;university;università;universitario;universitaria;université;universitaire;universitaires;universidad;universitade;Universität;Uniwersytet;университет;universiteit;πανεπιστήμιο key::1;university;università;università studi;universitario;universitaria;université;universitaire;universitaires;universidad;universitade;Universität;Uniwersytet;университет;universiteit;πανεπιστήμιο
2 key::2;studies;studi;études;estudios;estudos;Studien;studia;исследования;studies;σπουδές key::2;studies;studi;études;estudios;estudos;Studien;studia;исследования;studies;σπουδές
3 key::3;advanced;superiore;supérieur;supérieure;supérieurs;supérieures;avancado;avancados;fortgeschrittene;fortgeschritten;zaawansowany;передовой;gevorderd;gevorderde;προχωρημένος;προχωρημένη;προχωρημένο;προχωρημένες;προχωρημένα key::3;advanced;superiore;supérieur;supérieure;supérieurs;supérieures;avancado;avancados;fortgeschrittene;fortgeschritten;zaawansowany;передовой;gevorderd;gevorderde;προχωρημένος;προχωρημένη;προχωρημένο;προχωρημένες;προχωρημένα
4 key::4;institute;istituto;institut;instituto;instituto;Institut;instytut;институт;instituut;ινστιτούτο key::4;institute;istituto;institut;instituto;instituto;Institut;instytut;институт;instituut;ινστιτούτο
5 key::5;hospital;ospedale;hôpital;hospital;hospital;Krankenhaus;szpital;больница;ziekenhuis;νοσοκομείο key::5;hospital;ospedale;hôpital;hospital;hospital;Krankenhaus;szpital;больница;ziekenhuis;νοσοκομείο
6 key::6;research;ricerca;recherche;investigacion;pesquisa;Forschung;badania;исследования;onderzoek;έρευνα key::6;research;ricerca;recherche;investigacion;pesquisa;Forschung;badania;исследования;onderzoek;έρευνα;erevna;erevnas
7 key::7;college;collegio;université;colegio;faculdade;Hochschule;Szkoła Wyższa;Высшая школа;universiteit;κολλέγιο key::7;college;collegio;université;colegio;faculdade;Hochschule;Szkoła Wyższa;Высшая школа;universiteit;κολλέγιο
8 key::8;foundation;fondazione;fondation;fundación;fundação;Stiftung;Fundacja;фонд;stichting;ίδρυμα key::8;foundation;fondazione;fondation;fundación;fundação;Stiftung;Fundacja;фонд;stichting;ίδρυμα;idryma
9 key::9;center;centro;centre;centro;centro;zentrum;centrum;центр;centrum;κέντρο key::9;center;centro;centre;centro;centro;zentrum;centrum;центр;centrum;κέντρο
10 key::10;national;nazionale;national;nationale;nationaux;nationales;nacional;nacional;national;krajowy;национальный;nationaal;nationale;εθνικό key::10;national;nazionale;national;nationale;nationaux;nationales;nacional;nacional;national;krajowy;национальный;nationaal;nationale;εθνικό
11 key::11;association;associazione;association;asociación;associação;Verein;verband;stowarzyszenie;ассоциация;associatie key::11;association;associazione;association;asociación;associação;Verein;verband;stowarzyszenie;ассоциация;associatie
44 key::44;academic;accademico;académique;universitaire;акадеческий academisch;ακαδημαϊκός;ακαδημαϊκή;ακαδημαϊκό;ακαδημαϊκές;ακαδημαϊκοί key::44;academic;accademico;académique;universitaire;акадеческий academisch;ακαδημαϊκός;ακαδημαϊκή;ακαδημαϊκό;ακαδημαϊκές;ακαδημαϊκοί
45 key::45;institution;istituzione;institution;институциональный;instelling;ινστιτούτο key::45;institution;istituzione;institution;институциональный;instelling;ινστιτούτο
46 key::46;division;divisione;division;отделение;divisie;τμήμα key::46;division;divisione;division;отделение;divisie;τμήμα
47 key::47;committee;comitato;comité;комитет;commissie;επιτροπή key::47;committee;comitato;comité;комитет;commissie;επιτροπή
48 key::48;promotion;promozione;продвижение;proothisis;forderung
49 key::49;medical;medicine;clinical;medicina;clinici;médico;medicina;clínica;médico;medicina;clínica;medizinisch;Medizin;klinisch;medisch;geneeskunde;klinisch;ιατρικός;ιατρική;ιατρικό;ιατρικά;κλινικός;κλινική;κλινικό;κλινικά;tıbbi;tıp;klinik;orvosi;orvostudomány;klinikai;zdravniški;medicinski;klinični;meditsiini;kliinik;kliiniline;
50 key::50;technology;technological;tecnologia;tecnologie;tecnología;tecnológico;tecnologia;tecnológico;Technologie;technologisch;technologie;technologisch;τεχνολογία;τεχνολογικός;τεχνολογική;τεχνολογικό;teknoloji;teknolojik;technológia;technológiai;tehnologija;tehnološki;tehnoloogia;tehnoloogiline;
51 key::51;science;scientific;scienza;scientifiche;scienze;ciencia;científico;ciência;científico;Wissenschaft;wissenschaftlich;wetenschap;wetenschappelijk;επιστήμη;επιστημονικός;επιστημονική;επιστημονικό;επιστημονικά;bilim;bilimsel;tudomány;tudományos;znanost;znanstveni;teadus;teaduslik;
52 key::52;engineering;ingegneria;ingeniería;engenharia;Ingenieurwissenschaft;ingenieurswetenschappen;bouwkunde;μηχανικός;μηχανική;μηχανικό;mühendislik;mérnöki;Inženirstvo;inseneeria;inseneri;
53 key::53;management;gestione;gestionale;gestionali;gestión;administración;gestão;administração;Verwaltung;management;διαχείριση;yönetim;menedzsment;vodstvo;upravljanje;management;juhtkond;juhtimine;haldus;
54 key::54;energy;energia;energía;energia;Energie;energie;ενέργεια;enerji;energia;energija;energia;
55 key::55;agricultural;agriculture;agricoltura;agricole;agrícola;agricultura;agrícola;agricultura;landwirtschaftlich;Landwirtschaft;landbouwkundig;landbouw;αγροτικός;αγροτική;αγροτικό;γεωργικός;γεωργική;γεωργικό;γεωργία;tarımsal;tarım;mezőgazdasági;mezőgazdaság;poljedelski;poljedelstvo;põllumajandus;põllumajanduslik;
56 key::56;information;informazione;información;informação;Information;informatie;πληροφορία;bilgi;információ;informacija;informatsioon;
57 key::57;social;sociali;social;social;Sozial;sociaal;maatschappelijk;κοινωνικός;κοινωνική;κοινωνικό;κοινωνικά;sosyal;szociális;družbeni;sotsiaal;sotsiaalne;
58 key::58;environmental;ambiente;medioambiental;ambiente;medioambiente;meioambiente;Umwelt;milieu;milieuwetenschap;milieukunde;περιβαλλοντικός;περιβαλλοντική;περιβαλλοντικό;περιβαλλοντικά;çevre;környezeti;okoliški;keskonna;;
59 key::59;business;economia;economiche;economica;negocio;empresa;negócio;Unternehmen;bedrijf;bedrijfskunde;επιχείρηση;iş;üzleti;posel;ettevõte/äri;
60 key::60;pharmaceuticals;pharmacy;farmacia;farmaceutica;farmacéutica;farmacia;farmacêutica;farmácia;Pharmazeutika;Arzneimittelkunde;farmaceutica;geneesmiddelen;apotheek;φαρμακευτικός;φαρμακευτική;φαρμακευτικό;φαρμακευτικά;φαρμακείο;ilaç;eczane;gyógyszerészeti;gyógyszertár;farmacevtika;lekarništvo;farmaatsia;farmatseutiline;
61 key::61;healthcare;salute;atenciónmédica;cuidadodelasalud;cuidadoscomasaúde;Gesundheitswesen;gezondheidszorg;ιατροφαρμακευτικήπερίθαλψη;sağlıkhizmeti;egészségügy;zdravstvo;tervishoid;tervishoiu;
62 key::62;history;storia;historia;história;Geschichte;geschiedenis;geschiedkunde;ιστορία;tarih;történelem;zgodovina;ajalugu;
63 key::63;materials;materiali;materia;materiales;materiais;materialen;υλικά;τεκμήρια;malzemeler;anyagok;materiali;materjalid;vahendid;
64 key::64;economics;economia;economiche;economica;economía;economia;Wirtschaft;economie;οικονομικά;οικονομικέςεπιστήμες;ekonomi;közgazdaságtan;gospodarstvo;ekonomija;majanduslik;majandus;
65 key::65;therapeutics;terapeutica;terapéutica;terapêutica;therapie;θεραπευτική;tedavibilimi;gyógykezelés;terapevtika;terapeutiline;ravi;
66 key::66;oncology;oncologia;oncologico;oncología;oncologia;Onkologie;oncologie;ογκολογία;onkoloji;onkológia;onkologija;onkoloogia;
67 key::67;natural;naturali;naturale;natural;natural;natürlich;natuurlijk;φυσικός;φυσική;φυσικό;φυσικά;doğal;természetes;naraven;loodus;
68 key::68;educational;educazione;pedagogia;educacional;educativo;educacional;pädagogisch;educatief;εκπαιδευτικός;εκπαιδευτική;εκπαιδευτικό;εκπαιδευτικά;eğitimsel;oktatási;izobraževalen;haridus;hariduslik;
69 key::69;biomedical;biomedica;biomédico;biomédico;biomedizinisch;biomedisch;βιοιατρικός;βιοιατρική;βιοιατρικό;βιοιατρικά;biyomedikal;orvosbiológiai;biomedicinski;biomeditsiiniline;
70 key::70;veterinary;veterinaria;veterinarie;veterinaria;veterinária;tierärtzlich;veterinair;veeartsenijlkunde;κτηνιατρικός;κτηνιατρική;κτηνιατρικό;κτηνιατρικά;veteriner;állatorvosi;veterinar;veterinarski;veterinaaria;
71 key::71;chemistry;chimica;química;química;Chemie;chemie;scheikunde;χημεία;kimya;kémia;kemija;keemia;
72 key::72;security;sicurezza;seguridad;segurança;Sicherheit;veiligheid;ασφάλεια;güvenlik;biztonsági;varnost;turvalisus;julgeolek;
73 key::73;biotechnology;biotecnologia;biotecnologie;biotecnología;biotecnologia;Biotechnologie;biotechnologie;βιοτεχνολογία;biyoteknoloji;biotechnológia;biotehnologija;biotehnoloogia;
74 key::74;military;militare;militari;militar;militar;Militär;militair;leger;στρατιωτικός;στρατιωτική;στρατιωτικό;στρατιωτικά;askeri;katonai;vojaški;vojni;militaar;
75 key::75;theological;teologia;teologico;teológico;tecnológica;theologisch;theologisch;θεολογικός;θεολογική;θεολογικό;θεολογικά;teolojik;technológiai;teološki;teoloogia;usuteadus;teoloogiline;
76 key::76;electronics;elettronica;electrónica;eletrônicos;Elektronik;elektronica;ηλεκτρονική;elektronik;elektronika;elektronika;elektroonika;
77 key::77;forestry;forestale;forestali;silvicultura;forestal;floresta;Forstwirtschaft;bosbouw;δασοκομία;δασολογία;ormancılık;erdészet;gozdarstvo;metsandus;
78 key::78;maritime;marittima;marittime;marittimo;marítimo;marítimo;maritiem;ναυτικός;ναυτική;ναυτικό;ναυτικά;ναυτιλιακός;ναυτιλιακή;ναυτιλιακό;ναυτιλιακά;θαλάσσιος;θαλάσσια;θαλάσσιο;denizcilik;tengeri;morski;mere;merendus;
79 key::79;sports;sport;deportes;esportes;Sport;sport;sportwetenschappen;άθληση;γυμναστικήδραστηριότητα;spor;sport;šport;sport;spordi;
80 key::80;surgery;chirurgia;chirurgiche;cirugía;cirurgia;Chirurgie;chirurgie;heelkunde;εγχείρηση;επέμβαση;χειρουργικήεπέμβαση;cerrahi;sebészet;kirurgija;kirurgia;
81 key::81;cultural;culturale;culturali;cultura;cultural;cultural;kulturell;cultureel;πολιτιστικός;πολιτιστική;πολιτιστικό;πολιτισμικός;πολιτισμική;πολιτισμικό;kültürel;kultúrális;kulturni;kultuuri;kultuuriline;
82 key::82;computerscience;informatica;ordenador;computadora;informática;computación;cienciasdelacomputación;ciênciadacomputação;Computer;computer;υπολογιστής;ηλεκτρονικόςυπολογιστής;bilgisayar;számítógép;računalnik;arvuti;
83 key::83;finance;financial;finanza;finanziarie;finanza;financiero;finanças;financeiro;Finanzen;finanziell;financiën;financieel;χρηματοοικονομικά;χρηματοδότηση;finanse;finansal;pénzügy;pénzügyi;finance;finančni;finants;finantsiline;
84 key::84;communication;comunicazione;comuniciación;comunicação;Kommunikation;communication;επικοινωνία;iletişim;kommunikáció;komuniciranje;kommunikatsioon;
85 key::85;justice;giustizia;justicia;justiça;Recht;Justiz;justitie;gerechtigheid;δικαιοσύνη;υπουργείοδικαιοσύνης;δίκαιο;adalet;igazságügy;pravo;õigus;
86 key::86;aerospace;aerospaziale;aerospaziali;aeroespacio;aeroespaço;Luftfahrt;luchtvaart;ruimtevaart;αεροπορικός;αεροπορική;αεροπορικό;αεροναυπηγικός;αεροναυπηγική;αεροναυπηγικό;αεροναυπηγικά;havacılıkveuzay;légtér;zrakoplovstvo;atmosfäär;kosmos;
87 key::87;dermatology;dermatologia;dermatología;dermatologia;Dermatologie;dermatologie;δρματολογία;dermatoloji;bőrgyógyászat;dermatológia;dermatologija;dermatoloogia;
88 key::88;architecture;architettura;arquitectura;arquitetura;Architektur;architectuur;αρχιτεκτονική;mimarlık;építészet;arhitektura;arhitektuur;
89 key::89;mathematics;matematica;matematiche;matemáticas;matemáticas;Mathematik;wiskunde;mathematica;μαθηματικά;matematik;matematika;matematika;matemaatika;
90 key::90;language;lingue;linguistica;linguistiche;lenguaje;idioma;língua;idioma;Sprache;taal;taalkunde;γλώσσα;dil;nyelv;jezik;keel;
91 key::91;neuroscience;neuroscienza;neurociencia;neurociência;Neurowissenschaft;neurowetenschappen;νευροεπιστήμη;nörobilim;idegtudomány;nevroznanost;neuroteadused;
92 key::92;automation;automazione;automatización;automação;Automatisierung;automatisering;αυτοματοποίηση;otomasyon;automatizálás;avtomatizacija;automatiseeritud;
93 key::93;pediatric;pediatria;pediatriche;pediatrico;pediátrico;pediatría;pediátrico;pediatria;pädiatrisch;pediatrische;παιδιατρική;pediatrik;gyermekgyógyászat;pediatrija;pediaatria;
94 key::94;photonics;fotonica;fotoniche;fotónica;fotônica;Photonik;fotonica;φωτονική;fotonik;fotonika;fotonika;fotoonika;
95 key::95;mechanics;meccanica;meccaniche;mecánica;mecânica;Mechanik;Maschinenbau;mechanica;werktuigkunde;μηχανικής;mekanik;gépészet;mehanika;mehaanika;
96 key::96;psychiatrics;psichiatria;psichiatrica;psichiatriche;psiquiatría;psiquiatria;Psychiatrie;psychiatrie;ψυχιατρική;psikiyatrik;pszihiátria;psihiatrija;psühhaatria;
97 key::97;psychology;fisiologia;psicología;psicologia;Psychologie;psychologie;ψυχολογία;psikoloji;pszihológia;psihologija;psühholoogia;
98 key::98;automotive;industriaautomobilistica;industriadelautomóvil;automotriz;industriaautomotriz;automotivo;Automobilindustrie;autoindustrie;αυτοκίνητος;αυτοκίνητη;αυτοκίνητο;αυτοκινούμενος;αυτοκινούμενη;αυτοκινούμενο;αυτοκινητιστικός;αυτοκινητιστική;αυτοκινητιστικό;otomotiv;autóipari;samogiben;avtomobilskaindustrija;auto-;
99 key::99;neurology;neurologia;neurologiche;neurología;neurologia;Neurologie;neurologie;zenuwleer;νευρολογία;nöroloji;neurológia;ideggyógyászat;nevrologija;neuroloogia;
100 key::100;geology;geologia;geologiche;geología;geologia;Geologie;geologie;aardkunde;γεωλογία;jeoloji;geológia;földtudomány;geologija;geoloogia;
101 key::101;microbiology;microbiologia;micro-biologia;microbiologiche;microbiología;microbiologia;Mikrobiologie;microbiologie;μικροβιολογία;mikrobiyoloji;mikrobiológia;mikrobiologija;mikrobioloogia;
102 key::102;informatics;informatica;informática;informática;informatica;
103 key:103;forschungsgemeinschaft;comunita ricerca;research community;research foundation;research association

View File

@ -1,14 +1,15 @@
package eu.dnetlib.pace;
import java.io.IOException;
import java.io.StringWriter;
import org.apache.commons.io.IOUtils;
import eu.dnetlib.pace.config.Type;
import eu.dnetlib.pace.model.Field;
import eu.dnetlib.pace.model.FieldListImpl;
import eu.dnetlib.pace.model.FieldValueImpl;
import org.junit.Test;
import org.apache.commons.io.IOUtils;
import java.io.IOException;
import java.io.StringWriter;
import java.util.List;
import java.util.stream.Collectors;
public abstract class AbstractPaceTest {
@ -34,4 +35,14 @@ public abstract class AbstractPaceTest {
return new FieldValueImpl(Type.URL, "url", s);
}
protected Field createFieldList(List<String> strings, String fieldName){
List<FieldValueImpl> fieldValueStream = strings.stream().map(s -> new FieldValueImpl(Type.String, fieldName, s)).collect(Collectors.toList());
FieldListImpl a = new FieldListImpl();
a.addAll(fieldValueStream);
return a;
}
}

View File

@ -1,12 +1,8 @@
package eu.dnetlib.pace.config;
import com.google.common.collect.Maps;
import eu.dnetlib.pace.AbstractPaceTest;
import org.junit.Test;
import java.io.IOException;
import java.util.Map;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
@ -14,56 +10,31 @@ public class ConfigTest extends AbstractPaceTest {
@Test
public void dedupConfigSerializationTest() {
String fromClasspath = readFromClasspath("result.pace.conf.json");
System.out.println("fromClasspath = " + fromClasspath);
final DedupConfig cfgFromClasspath = DedupConfig.load(readFromClasspath("result.pace.conf.json"));
final DedupConfig conf = DedupConfig.load(fromClasspath);
final String conf = cfgFromClasspath.toString();
assertNotNull(conf);
// System.out.println("*****SERIALIZED*****");
// System.out.println(conf);
// System.out.println("*****FROM CLASSPATH*****");
// System.out.println(readFromClasspath("result.pace.conf.json"));
final DedupConfig cfgFromSerialization = DedupConfig.load(conf);
assertEquals(cfgFromClasspath.toString(), cfgFromSerialization.toString());
assertNotNull(cfgFromClasspath);
assertNotNull(cfgFromSerialization);
String parsed = conf.toString();
System.out.println("parsed = " + parsed);
DedupConfig conf2 = DedupConfig.load(parsed);
assertNotNull(conf2);
System.out.println("conf2 = " + conf2);
assertEquals(parsed, conf2.toString());
}
@Test
public void dedupConfigTest() {
DedupConfig load = DedupConfig.load(readFromClasspath("result.pace.conf.json"));
DedupConfig load = DedupConfig.load(readFromClasspath("org.curr.conf"));
assertNotNull(load);
System.out.println(load.toString());
}
@Test
public void testLoadDefaults() throws IOException {
final String entityType = "organization";
final String configurationId = "dedup-organization-simple";
final Map<String, String> config = Maps.newHashMap();
config.put("entityType", entityType);
config.put("configurationId", configurationId);
final DedupConfig dedupConf = DedupConfig.loadDefault(config);
//System.out.println("dedupConf = " + dedupConf);
assertNotNull(dedupConf);
assertNotNull(dedupConf.getWf());
assertEquals(dedupConf.getWf().getEntityType(), entityType);
assertEquals(dedupConf.getWf().getConfigurationId(), configurationId);
}
}

View File

@ -1,17 +1,17 @@
package eu.dnetlib.pace.distance;
import eu.dnetlib.pace.distance.algo.JaroWinklerNormalizedName;
import eu.dnetlib.pace.distance.algo.LevensteinTitleIgnoreVersion;
import org.apache.commons.lang.StringUtils;
import org.junit.Before;
import org.junit.Test;
import eu.dnetlib.pace.common.AbstractPaceFunctions;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import static junit.framework.Assert.assertEquals;
import static junit.framework.Assert.assertTrue;
public class DistanceAlgoTest extends AbstractPaceFunctions {
private final static String TEST_STRING = "Toshiba NB550D: è un netbook su piattaforma AMD Fusion⁽¹²⁾.";
@ -48,17 +48,59 @@ public class DistanceAlgoTest extends AbstractPaceFunctions {
@Test
public void testJaroWinklerNormalizedName() {
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("universita degli studi di genova", "universita di genova");
double result = jaroWinklerNormalizedName.distance("Universita di Pisa", "Universita di Parma");
System.out.println(result);
assertEquals(result, 0.0);
}
@Test
public void testLevensteinTitleIgnoreVersion() {
final LevensteinTitleIgnoreVersion algo = new LevensteinTitleIgnoreVersion(params);
double result = algo.distance("gCube data layer v1.0 XI", "gCube data layer v1.5 VIII");
public void testJaroWinklerNormalizedName2() {
System.out.println(result);
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("University of New York", "Università di New York");
assertEquals(result, 1.0);
}
@Test
public void testJaroWinklerNormalizedName3() {
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("Biblioteca dell'Universita di Bologna", "Università di Bologna");
System.out.println("result = " + result);
assertEquals(result, 0.0);
}
@Test
public void testJaroWinklerNormalizedName4() {
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("Universita degli studi di Pisa", "Universita di Pisa");
System.out.println("result = " + result);
assertEquals(result, 1.0);
}
@Test
public void testJaroWinklerNormalizedName5() {
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("RESEARCH PROMOTION FOUNDATION", "IDRYMA PROOTHISIS EREVNAS");
System.out.println("result = " + result);
assertEquals(result, 1.0);
}
@Test
public void testJaroWinklerNormalizedName6() {
final JaroWinklerNormalizedName jaroWinklerNormalizedName = new JaroWinklerNormalizedName(params);
double result = jaroWinklerNormalizedName.distance("Fonds zur Förderung der wissenschaftlichen Forschung (Austrian Science Fund)", "Fonds zur Förderung der wissenschaftlichen Forschung");
System.out.println("result = " + result);
assertTrue(result> 0.9);
}
}

View File

@ -0,0 +1,36 @@
{
"wf" : {
"threshold" : "0.9",
"dedupRun" : "001",
"entityType" : "organization",
"orderField" : "legalname",
"queueMaxSize" : "2000",
"groupMaxSize" : "10",
"slidingWindowSize" : "200",
"rootBuilder" : [ "organization", "projectOrganization_participation_isParticipant", "datasourceOrganization_provision_isProvidedBy" ],
"includeChildren" : "true"
},
"pace" : {
"clustering" : [
{ "name" : "sortedngrampairs", "fields" : [ "legalname" ], "params" : { "max" : 2, "ngramLen" : "3"} },
{ "name" : "suffixprefix", "fields" : [ "legalname" ], "params" : { "max" : 1, "len" : "3" } },
{ "name" : "urlclustering", "fields" : [ "websiteurl" ], "params" : { } }
],
"strictConditions" : [
{ "name" : "exactMatch", "fields" : [ "gridid" ] }
],
"conditions" : [
{ "name" : "exactMatch", "fields" : [ "country" ] },
{ "name" : "DomainExactMatch", "fields" : [ "websiteurl" ] }
],
"model" : [
{ "name" : "legalname", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value" },
{ "name" : "country", "algo" : "Null", "type" : "String", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/country/classid" },
{ "name" : "legalshortname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.1", "ignoreMissing" : "true", "path" : "organization/metadata/legalshortname/value" },
{ "name" : "legalname", "algo" : "JaroWinklerNormalizedName", "type" : "String", "weight" : "0.9", "ignoreMissing" : "false", "path" : "organization/metadata/legalname/value", "params" : {"windowSize" : 4, "threshold" : 0.5} },
{ "name" : "websiteurl", "algo" : "Null", "type" : "URL", "weight" : "0", "ignoreMissing" : "true", "path" : "organization/metadata/websiteurl/value", "params" : { "host" : 0.5, "path" : 0.5 } },
{ "name" : "gridid", "algo" : "Null", "type" : "String", "weight" : "0.0", "ignoreMissing" : "true", "path" : "pid[qualifier#classid = {grid}]/value" }
],
"blacklists" : { }
}
}

193
pom.xml
View File

@ -35,6 +35,8 @@
<url>https://issue.openaire.research-infrastructures.eu/projects/openaire</url>
</issueManagement>
<distributionManagement>
<repository>
<id>dnet45-releases</id>
@ -70,6 +72,18 @@
</snapshots>
</repository>
<repository>
<id>cloudera</id>
<name>Cloudera Repository</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<build>
@ -77,24 +91,147 @@
<outputDirectory>target/classes</outputDirectory>
<finalName>${project.artifactId}-${project.version}</finalName>
<testOutputDirectory>target/test-classes</testOutputDirectory>
<!--*************************************************-->
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>verify</phase>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19.1</version>
<configuration>
<redirectTestOutputToFile>true</redirectTestOutputToFile>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.10.4</version>
<configuration>
<detectLinks>true</detectLinks>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
<version>2.13</version>
<executions>
<execution>
<id>integration-test</id>
<goals>
<goal>integration-test</goal>
</goals>
</execution>
<execution>
<id>verify</id>
<goals>
<goal>verify</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.0</version>
<artifactId>maven-release-plugin</artifactId>
<version>2.5.3</version>
</plugin>
</plugins>
<!--***********************************************************************-->
<!--<plugins>-->
<!--<plugin>-->
<!--<groupId>org.apache.maven.plugins</groupId>-->
<!--<artifactId>maven-compiler-plugin</artifactId>-->
<!--<version>3.6.0</version>-->
<!--<configuration>-->
<!--<source>1.8</source>-->
<!--<target>1.8</target>-->
<!--<encoding>${project.build.sourceEncoding}</encoding>-->
<!--</configuration>-->
<!--</plugin>-->
<!--<plugin>-->
<!--<groupId>org.apache.maven.plugins</groupId>-->
<!--<artifactId>maven-dependency-plugin</artifactId>-->
<!--<version>3.0.0</version>-->
<!--</plugin>-->
<!--<plugin>-->
<!--<groupId>org.apache.maven.plugins</groupId>-->
<!--<artifactId>maven-failsafe-plugin</artifactId>-->
<!--<version>2.13</version>-->
<!--<executions>-->
<!--<execution>-->
<!--<id>integration-test</id>-->
<!--<goals>-->
<!--<goal>integration-test</goal>-->
<!--</goals>-->
<!--</execution>-->
<!--<execution>-->
<!--<id>verify</id>-->
<!--<goals>-->
<!--<goal>verify</goal>-->
<!--</goals>-->
<!--</execution>-->
<!--</executions>-->
<!--</plugin>-->
<!--</plugins>-->
</build>
<dependencyManagement>
@ -128,14 +265,20 @@
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.6</version>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
@ -171,11 +314,19 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
@ -188,8 +339,22 @@
<artifactId>reflections</artifactId>
<version>0.9.10</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.oozie</groupId>
<artifactId>oozie-client</artifactId>
<version>5.1.0</version>
</dependency>
</dependencies>
</dependencyManagement>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
@ -199,6 +364,7 @@
<google.guava.version>15.0</google.guava.version>
<spark.version>2.2.0</spark.version>
<jackson.version>2.6.6</jackson.version>
<commons.lang.version>2.6</commons.lang.version>
<commons.io.version>2.4</commons.io.version>
@ -206,6 +372,7 @@
<commons.logging.version>1.1.3</commons.logging.version>
<junit.version>4.9</junit.version>
<scala.version>2.11.8</scala.version>
<maven.javadoc.failOnError>false</maven.javadoc.failOnError>
</properties>