miconis
|
4b2124a18e
|
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
|
2021-02-10 11:51:50 +01:00 |
Alessia Bardi
|
c4d1feca74
|
mapper test with validated link to project
|
2021-02-10 11:22:54 +01:00 |
Alessia Bardi
|
09fc7e2f78
|
serialization of validated flag on relationships
|
2021-02-10 11:22:09 +01:00 |
Claudio Atzori
|
bc458d1b54
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2021-02-09 16:27:30 +01:00 |
Claudio Atzori
|
82e6c50f3f
|
updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename)
|
2021-02-09 16:27:04 +01:00 |
Claudio Atzori
|
62bd3c53ee
|
Merge branch 'master' into provision_indexing
|
2021-02-09 15:46:26 +01:00 |
Claudio Atzori
|
bae029f828
|
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
|
2021-02-08 18:07:23 +01:00 |
Claudio Atzori
|
bebc54d5bf
|
seq file storing native records is now compressed
|
2021-02-08 18:06:25 +01:00 |
Claudio Atzori
|
50add4c61b
|
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
|
2021-02-08 12:19:38 +01:00 |
Claudio Atzori
|
40df0f987d
|
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
|
2021-02-06 20:12:00 +01:00 |
Claudio Atzori
|
a8a758925e
|
better logging, WIP: collectorWorker error reporting
|
2021-02-05 19:18:05 +01:00 |
Claudio Atzori
|
730973679a
|
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
|
2021-02-04 17:25:00 +01:00 |
Claudio Atzori
|
deb85706db
|
imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2
|
2021-02-04 17:24:52 +01:00 |
Sandro La Bruzzo
|
4dae5e605d
|
implemented messaging btween collection worker and Dnet
|
2021-02-04 15:51:15 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Claudio Atzori
|
40764cf626
|
better logging, WIP: collectorWorker error reporting
|
2021-02-04 14:06:02 +01:00 |
Sandro La Bruzzo
|
69c253710b
|
fixed test
|
2021-02-04 10:30:49 +01:00 |
Claudio Atzori
|
e04045089f
|
better logging, WIP: collectorWorker error reporting
|
2021-02-03 17:58:22 +01:00 |
Alessia Bardi
|
c67329d3ad
|
updated test for EU Open Data portal datasets
|
2021-02-03 17:06:48 +01:00 |
Claudio Atzori
|
0e8a4f9f1a
|
better logging, WIP: collectorWorker error reporting
|
2021-02-03 12:33:41 +01:00 |
Alessia Bardi
|
fd705404a1
|
tests for EU Open Data portal dataset mapping
|
2021-02-03 10:28:17 +01:00 |
Claudio Atzori
|
53884d12c2
|
code formatting
|
2021-02-02 14:38:03 +01:00 |
Claudio Atzori
|
ac46c247d2
|
code formatting
|
2021-02-02 14:24:00 +01:00 |
Claudio Atzori
|
bde14b149a
|
fixed transformation target paths
|
2021-02-02 12:49:29 +01:00 |
Claudio Atzori
|
ca4391aa1c
|
minor changes
|
2021-02-02 12:44:04 +01:00 |
Claudio Atzori
|
bb89b99b24
|
code formatting
|
2021-02-02 12:34:14 +01:00 |
Claudio Atzori
|
75807ea5ae
|
factored out constants
|
2021-02-02 12:28:21 +01:00 |
Sandro La Bruzzo
|
0634674add
|
implemented transformation test
|
2021-02-02 12:12:14 +01:00 |
Claudio Atzori
|
8eaa1fd4b4
|
WIP: metadata collection in INCREMENTAL mode and relative test
|
2021-02-01 19:29:10 +01:00 |
Sandro La Bruzzo
|
bead34d11a
|
code refactor
|
2021-02-01 14:58:06 +01:00 |
Sandro La Bruzzo
|
6ff234d81b
|
Implemented a first prototype of incremental harvesting and trasformation using readlock
|
2021-02-01 13:56:05 +01:00 |
Sandro La Bruzzo
|
b6b835ef49
|
update transformation Factory to get Transformation Rule by Id and not by Title
|
2021-02-01 08:49:42 +01:00 |
Sandro La Bruzzo
|
e423634cb6
|
RollBack in case of error WORKS!!!
|
2021-01-29 17:21:42 +01:00 |
Sandro La Bruzzo
|
8ee82576c6
|
Collection on Refresh WORKS!!!
|
2021-01-29 17:02:46 +01:00 |
Sandro La Bruzzo
|
0276180039
|
WIP mdstore
transaction implemented on hadoop side
|
2021-01-29 16:42:41 +01:00 |
Sandro La Bruzzo
|
0f8e2ecce6
|
Merged Datacite transfrom into this branch
|
2021-01-29 10:45:07 +01:00 |
Sandro La Bruzzo
|
99cf3a8ea4
|
Merged Datacite transfrom into this branch
|
2021-01-28 16:34:46 +01:00 |
Sandro La Bruzzo
|
686e7b507c
|
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
|
2021-01-28 10:02:13 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
184e7b3856
|
Implemented new Transformation using spark
|
2021-01-27 15:43:08 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Claudio Atzori
|
f1a852f278
|
align usage-stats workflow poms with latest snapshot version
|
2021-01-26 15:42:42 +01:00 |
Claudio Atzori
|
9c32119dc2
|
Merge pull request 'usage-stats-export-wf-v2' (#89) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
Thank you Dimitris!
|
2021-01-26 15:01:41 +01:00 |
Claudio Atzori
|
885e0dd926
|
[Cleaning] filter authors not providing word characters in the fullname
|
2021-01-26 09:48:53 +01:00 |
Claudio Atzori
|
2890511613
|
[Cleaning] normalise missing Result.country
|
2021-01-26 09:41:44 +01:00 |
Claudio Atzori
|
4eb9ed35b1
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2021-01-25 18:12:24 +01:00 |
Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Alessia Bardi
|
505477f36f
|
format code
|
2021-01-25 18:02:49 +01:00 |
Alessia Bardi
|
ded6ed8d7d
|
no ',' author, if there are no author in ODF records
|
2021-01-25 17:57:51 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Sandro La Bruzzo
|
ffb092b8d3
|
removed duplicate code HttpConnector.java
|
2021-01-25 15:05:37 +01:00 |
Sandro La Bruzzo
|
cda210a2ca
|
changed documentation since it didn't reflect the current status
|
2021-01-25 14:17:42 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
miconis
|
c7e2d5a59a
|
minor changes
|
2021-01-25 12:40:45 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
miconis
|
8fea29177c
|
refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf
|
2021-01-18 16:48:08 +01:00 |
Dimitris
|
3e8d2a6b2d
|
Clean workflows
|
2021-01-15 16:19:12 +02:00 |
Michele Artini
|
cfbcdc95bc
|
fixed a wf param
|
2021-01-14 14:45:23 +01:00 |
Michele Artini
|
69ba3203c0
|
fixed a conflict
|
2021-01-14 14:43:25 +01:00 |
Michele Artini
|
b230d44411
|
fixed conflict
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
b9d90e95b8
|
Added eventId to ShortEventMessage
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
64b0b0bfb3
|
fixed a bug with invalid subject topic
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
e3e0ab1de1
|
fixed a problem with join
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
26a941315a
|
openaireId
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
6f4d1a37f0
|
ES wf properties
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
1391341d06
|
mkdir of output dir
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
3c9cbd19f3
|
whitelist of topics
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
467aa77279
|
workingDir and outputDir
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
10f3f7eca7
|
workingDir and outputDir
|
2021-01-14 14:32:31 +01:00 |
Michele Artini
|
ff41a7b3a4
|
gzipped output
|
2021-01-14 14:32:31 +01:00 |
Claudio Atzori
|
80cf55ef2e
|
[Broker] fixed partitionEventsByOpendoarIds workflow parameter names
|
2021-01-13 16:24:30 +01:00 |
Claudio Atzori
|
41500669e2
|
[BIP! Scores integration] merged missing classes from bipFinder branch
|
2021-01-11 14:39:47 +01:00 |
Claudio Atzori
|
2a7a10809e
|
[BIP! Scores integration] merged missing classes from bipFinder branch
|
2021-01-11 10:05:02 +01:00 |
Claudio Atzori
|
d6686dd7cf
|
merged from master
|
2021-01-08 18:16:12 +01:00 |
Claudio Atzori
|
34229970e6
|
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
|
2021-01-08 16:29:17 +01:00 |
Claudio Atzori
|
1361c9eb0c
|
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
|
2021-01-07 10:07:30 +01:00 |
Claudio Atzori
|
ab2fe9266a
|
[DOIBoost] minor fixes in workflow definition
|
2021-01-05 10:26:39 +01:00 |
Claudio Atzori
|
7c722f3fdc
|
[DOIBoost] fixed typo
|
2021-01-05 10:25:54 +01:00 |
Claudio Atzori
|
8879704ba0
|
[DOIBoost] configurable ES server url and index name in crossref importer
|
2021-01-05 10:00:13 +01:00 |
Claudio Atzori
|
26e9d55c13
|
code formatting
|
2021-01-05 09:59:26 +01:00 |
Sandro La Bruzzo
|
7834a35768
|
avoid to save intermediate dataset before generation of Sequence file
|
2021-01-04 17:54:57 +01:00 |
Sandro La Bruzzo
|
e79445a8b4
|
minor fix for claudio polemica
|
2021-01-04 17:39:25 +01:00 |
Sandro La Bruzzo
|
8765020b85
|
minor fix
|
2021-01-04 17:37:08 +01:00 |
Sandro La Bruzzo
|
b0dc92786f
|
defined a single oozie workflow for the generation of doiboost
|
2021-01-04 17:01:35 +01:00 |
Claudio Atzori
|
7185158942
|
ignore missing properties
|
2020-12-29 11:06:28 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
60649ac7d2
|
swapped expected and actual in tests, updated expected number of authors
|
2020-12-23 12:26:04 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
7bfc35df5e
|
Merge pull request 'Changed typo in script names' (#82) from antonis.lempesis/dnet-hadoop:master into master
no need to! :)
|
2020-12-22 12:36:21 +01:00 |
Antonis Lempesis
|
be5969a8c2
|
Changed typo in script names
|
2020-12-22 13:33:32 +02:00 |
miconis
|
1e1aab83e3
|
implementation of the raw wf for openorgs: still not complete, some functionalities are missing
|
2020-12-21 11:58:21 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
573a8a3272
|
Merge pull request 'Changed typo in script names' (#81) from antonis.lempesis/dnet-hadoop:master into master
ok! LGTM
|
2020-12-18 17:44:26 +01:00 |
Antonis Lempesis
|
2a074c3b2b
|
Changed typo in script names
|
2020-12-18 18:40:48 +02:00 |
Claudio Atzori
|
47270d9af5
|
lenient mock can be lenient
|
2020-12-18 15:38:59 +01:00 |
Claudio Atzori
|
2e503ee101
|
code formatting
|
2020-12-17 13:47:38 +01:00 |
Claudio Atzori
|
5a3e2199b2
|
Merge pull request 'Creation of the action set to include the bipFinder! score' (#80) from miriam.baglioni/dnet-hadoop:bipFinder into bipFinder_master_test
|
2020-12-17 12:26:38 +01:00 |
Claudio Atzori
|
03319d3bd9
|
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b , reversing
changes made to f9a8fd8bbd .
|
2020-12-17 12:23:58 +01:00 |
Claudio Atzori
|
add7e1693b
|
Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master
|
2020-12-17 12:09:03 +01:00 |
Alessia Bardi
|
f9a8fd8bbd
|
updated test record for textgrid
|
2020-12-17 11:59:45 +01:00 |
Claudio Atzori
|
4766495f5b
|
[orcid_to_result_from_semrel_propagation] fixed typo in SQL
|
2020-12-17 09:15:50 +01:00 |
Claudio Atzori
|
de00094ebc
|
Merge pull request 'FIX on the creation of subject based broker enrichments' (#79) from broker into master
|
2020-12-15 14:58:31 +01:00 |
Michele Artini
|
f9dc1e45fd
|
fixed a bug with invalid subject topic
|
2020-12-15 14:54:11 +01:00 |
Sandro La Bruzzo
|
f92bd56f56
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-12-15 11:47:29 +01:00 |
Sandro La Bruzzo
|
1f6c8a9e83
|
added orcid_pending type to records coming from Crossref
|
2020-12-15 11:47:15 +01:00 |
Claudio Atzori
|
9f1181290e
|
Merge pull request 'broker' (#78) from broker into master
The changes look good to me.
|
2020-12-15 10:03:45 +01:00 |
Michele Artini
|
0a0f62bd01
|
Merge branch 'master' into broker
|
2020-12-15 08:30:52 +01:00 |
Michele Artini
|
12fa5d122a
|
fixed a problem with join
|
2020-12-15 08:30:26 +01:00 |
Michele Artini
|
991e675dc6
|
validation in claim rels
|
2020-12-14 15:41:25 +01:00 |
Michele Artini
|
3e19cf7b4a
|
openaireId
|
2020-12-14 15:24:33 +01:00 |
Claudio Atzori
|
b6f08ce226
|
re-adding the old junit:junit dep as solr-test-framework needs it
|
2020-12-14 15:07:31 +01:00 |
Claudio Atzori
|
7d325e2c57
|
using actual result subclasses instead of their parent class
|
2020-12-14 14:40:54 +01:00 |
Claudio Atzori
|
152916890f
|
renamed test name
|
2020-12-14 14:40:05 +01:00 |
Michele Artini
|
a203aee32a
|
ES wf properties
|
2020-12-14 12:02:33 +01:00 |
Claudio Atzori
|
1506f49052
|
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
|
2020-12-14 11:14:03 +01:00 |
Michele Artini
|
d03756c962
|
mkdir of output dir
|
2020-12-14 11:11:41 +01:00 |
Michele Artini
|
399548f221
|
whitelist of topics
|
2020-12-14 11:03:55 +01:00 |
Michele Artini
|
38da1c282a
|
Merge branch 'master' into broker
|
2020-12-14 09:14:02 +01:00 |
Dimitris
|
dc9c2f3272
|
Commit 12122020
|
2020-12-12 12:00:14 +02:00 |
Claudio Atzori
|
61cd129ded
|
XML serialisation test
|
2020-12-11 12:44:53 +01:00 |
Claudio Atzori
|
ce7a319e01
|
using the correct assertion import
|
2020-12-11 12:44:17 +01:00 |
Claudio Atzori
|
7fe2433137
|
excluded transitive older junit dependencies, they can compromise the unit test executions
|
2020-12-11 12:42:55 +01:00 |
Claudio Atzori
|
d9532446eb
|
imported more diffs from master branch; code formatting
|
2020-12-10 16:14:16 +01:00 |
Claudio Atzori
|
1eaad89a3c
|
do not fail on uknown properties when grouping entities by ID
|
2020-12-10 15:56:11 +01:00 |
Michele Artini
|
933b4c1ada
|
workingDir and outputDir
|
2020-12-10 14:47:51 +01:00 |
Michele Artini
|
2e7df07328
|
workingDir and outputDir
|
2020-12-10 14:47:22 +01:00 |
Michele Artini
|
94bfed1c84
|
gzipped output
|
2020-12-10 11:59:28 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Miriam Baglioni
|
b7adbc7c3e
|
merge branch with master
|
2020-12-10 10:35:27 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
ff72fcd91a
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 19:04:50 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Claudio Atzori
|
211aa04726
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 18:08:51 +01:00 |
Claudio Atzori
|
ada21ad920
|
Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-12-09 17:22:56 +01:00 |
Claudio Atzori
|
3c5ce1dada
|
code formatting
|
2020-12-09 17:07:20 +01:00 |
Michele Artini
|
1bc9adc10d
|
default trust for validated rels
|
2020-12-09 16:18:37 +01:00 |
Claudio Atzori
|
fcd7689b50
|
promote actions: shouldGroupById parameter marked as optional (default is true)
|
2020-12-09 13:10:16 +01:00 |
Michele Artini
|
5f21a356fd
|
reindent
|
2020-12-09 11:24:30 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Antonis Lempesis
|
aead9efd24
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
77a3a6d82e
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
91226117b3
|
ignoring deletedbyinference relations
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
b7f29db126
|
finished first implementation of wf
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
ded2392275
|
initial implementation of the promote wf
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
1a87a1effd
|
added last step to update cache
|
2020-12-09 10:45:24 +01:00 |
Claudio Atzori
|
27e96767e0
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-07 21:53:22 +01:00 |
Claudio Atzori
|
fba11eef2a
|
cleanup
|
2020-12-07 21:53:13 +01:00 |