Antonis Lempesis
236435b470
following redirects
2021-03-12 14:11:21 +02:00
Antonis Lempesis
3c75a05044
fixed a ton of typos
2021-03-12 13:47:04 +02:00
Sandro La Bruzzo
4bb3bcafa5
add author sequence number
2021-03-11 11:32:32 +01:00
Sandro La Bruzzo
a8e5d0ea0d
updated test and fixed assign of access right
2021-03-11 10:41:24 +01:00
Sandro La Bruzzo
f5e7c57654
Fixed ticket 6282
2021-03-11 10:32:45 +01:00
Antonis Lempesis
fa1ec5b5e9
fixed typo...
2021-03-10 14:05:58 +02:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Sandro La Bruzzo
bbe1a7c69a
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export
2021-03-09 10:46:36 +01:00
Sandro La Bruzzo
a2169ccf07
// implemented Ticket #6281 added pid to Instance in doiBoost
2021-03-09 10:46:36 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Antonis Lempesis
f40c150a0d
fixed steps...
2021-03-06 00:35:57 +02:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Antonis Lempesis
6147ee4950
assigning correctly hive contexts to concepts
2021-03-05 14:12:18 +02:00
Antonis Lempesis
c5fbad8093
Contexts are now downloaded instead of using the stats_ext db
2021-03-04 00:42:21 +02:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
e8789b0cdb
Merge pull request 'stats DB for monitor' ( #99 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph
```
count r.reltype r.subreltype r.relclass
62.129.254 resultResult citation cites
62.043.309 resultResult citation isCitedBy
```
Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Antonis Lempesis
27796343ca
crude sleep. hardcoded value
2021-03-03 01:37:47 +02:00
Enrico Ottonello
70cb100647
added updating last orcid dataset folders after completion
2021-03-01 10:17:04 +01:00
Enrico Ottonello
bd3b16402b
added result typologies
2021-03-01 10:16:02 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Enrico Ottonello
ca1800510a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-25 18:45:02 +01:00
Enrico Ottonello
53d7023460
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
2021-02-25 18:43:29 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Enrico Ottonello
d43ea88caf
aligned orcid result typologies with openaire vocabulary
2021-02-25 15:02:10 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Enrico Ottonello
975823b968
data from last updated orcid
2021-02-23 15:35:04 +01:00
Miriam Baglioni
896919e735
merge upstream
2021-02-23 10:45:29 +01:00
Antonis Lempesis
d90767c733
correctly invalidating metadata
2021-02-19 03:18:47 +02:00
Antonis Lempesis
3681afbe04
typo
2021-02-19 03:04:27 +02:00
Antonis Lempesis
c5502eba8f
actually moved stats computation in impala instead of hive...
2021-02-19 02:54:39 +02:00
Antonis Lempesis
33c85d4e66
moved stats computation in impala instead of hive
2021-02-18 17:23:34 +02:00
Antonis Lempesis
b8e96c8ae7
moved cache update to the end
2021-02-18 16:42:22 +02:00
Antonis Lempesis
bcbfc052b1
fixed last errors in step 21
2021-02-18 16:32:54 +02:00
Antonis Lempesis
10a29a4b9a
fixes in monitor step
2021-02-18 15:05:59 +02:00
Antonis Lempesis
8ef66452d5
fixed typo
2021-02-17 22:24:44 +02:00
Antonis Lempesis
a8836e2f5f
fixed typo
2021-02-17 19:27:07 +02:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Antonis Lempesis
a445c1ac3d
fixed variable names in monitor script
2021-02-17 16:45:09 +02:00
Antonis Lempesis
00d516360f
added missing ;
2021-02-17 16:41:10 +02:00
Claudio Atzori
545f8f3e48
using jackson objectmapper instead of GSon to serialise the aggregation report
2021-02-17 12:15:00 +01:00
Claudio Atzori
b592d78bb4
WIP: collectorWorker error reporting, generalised reported implementation
2021-02-17 10:28:01 +01:00
Antonis Lempesis
cd1b794409
added the monitor db wf
2021-02-17 02:11:55 +02:00
Claudio Atzori
cf27905a71
WIP: collectorWorker error reporting, added report messages
2021-02-16 16:53:14 +01:00
Alessia Bardi
32e81c2d89
non validated rel has null value in validated field
2021-02-16 11:01:42 +01:00
Claudio Atzori
1abe6d1ad7
WIP: collectorWorker error reporting, added report messages
2021-02-15 15:08:59 +01:00
Claudio Atzori
523a6bfa97
Merge pull request 'first commit to the correct branch' ( #94 ) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
...
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Antonis Lempesis
1c029b9fc0
fixed formatting
2021-02-14 03:14:24 +02:00
Antonis Lempesis
2c4dcc90ba
analyzing tables to produce stats
2021-02-14 02:54:55 +02:00
Sandro La Bruzzo
7edcc87ed4
changed xslt behaviour on failure
2021-02-12 17:27:08 +01:00
Sandro La Bruzzo
6a37c7f175
merge fixed
2021-02-12 16:38:47 +01:00
Sandro La Bruzzo
b3f5c2351d
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
...
Conflicts:
dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo
f216277219
Implemented cleaning date
2021-02-12 16:34:52 +01:00
Andreas Czerniak
5a9017cf18
clone, min. changes, test, run
2021-02-12 14:32:36 +01:00
Claudio Atzori
aa55dedb8a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-12 12:31:05 +01:00
Claudio Atzori
29c6f7e255
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
2021-02-12 12:31:02 +01:00
Sandro La Bruzzo
17e6f1934e
fixed NPE on cleaner
2021-02-12 11:48:11 +01:00
Sandro La Bruzzo
ebcc3ec14f
updated wrong datacite identifier in trasformation
2021-02-11 16:25:51 +01:00
Michele Artini
83d815d0bc
only stats
2021-02-11 10:57:23 +01:00
Michele Artini
8c836bf930
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2021-02-11 10:54:41 +01:00
Michele Artini
8c1600398a
added resumeFrom parameter
2021-02-11 10:54:16 +01:00
Claudio Atzori
3f8f78cbfb
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-11 09:36:10 +01:00
Claudio Atzori
b34b5a39ca
index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue
2021-02-11 09:36:04 +01:00
Michele Artini
7249cceb53
switch of 2 nodes
2021-02-11 09:27:08 +01:00
Alessia Bardi
986dd969d3
use the proper import for Lists
2021-02-10 12:03:54 +01:00
miconis
4b2124a18e
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
2021-02-10 11:51:50 +01:00
Alessia Bardi
c4d1feca74
mapper test with validated link to project
2021-02-10 11:22:54 +01:00
Alessia Bardi
09fc7e2f78
serialization of validated flag on relationships
2021-02-10 11:22:09 +01:00
Enrico Ottonello
ee4ba7298b
fix last update read/write from file on hdfs
2021-02-09 23:24:57 +01:00
Claudio Atzori
bc458d1b54
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-09 16:27:30 +01:00
Claudio Atzori
82e6c50f3f
updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename)
2021-02-09 16:27:04 +01:00
Claudio Atzori
62bd3c53ee
Merge branch 'master' into provision_indexing
2021-02-09 15:46:26 +01:00
Claudio Atzori
bae029f828
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
2021-02-08 18:07:23 +01:00
Claudio Atzori
bebc54d5bf
seq file storing native records is now compressed
2021-02-08 18:06:25 +01:00
Claudio Atzori
50add4c61b
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
2021-02-08 12:19:38 +01:00
Miriam Baglioni
2f5e6647c6
merge upstream
2021-02-08 10:33:11 +01:00
Claudio Atzori
40df0f987d
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
2021-02-06 20:12:00 +01:00
Claudio Atzori
a8a758925e
better logging, WIP: collectorWorker error reporting
2021-02-05 19:18:05 +01:00
Claudio Atzori
730973679a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-04 17:25:00 +01:00
Claudio Atzori
deb85706db
imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2
2021-02-04 17:24:52 +01:00
Sandro La Bruzzo
4dae5e605d
implemented messaging btween collection worker and Dnet
2021-02-04 15:51:15 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Claudio Atzori
40764cf626
better logging, WIP: collectorWorker error reporting
2021-02-04 14:06:02 +01:00
Enrico Ottonello
c238561001
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-04 10:44:21 +01:00
Enrico Ottonello
465ce39f75
job execution now based on file last_update.txt on hdfs
2021-02-04 10:44:04 +01:00
Sandro La Bruzzo
69c253710b
fixed test
2021-02-04 10:30:49 +01:00
Claudio Atzori
e04045089f
better logging, WIP: collectorWorker error reporting
2021-02-03 17:58:22 +01:00
Alessia Bardi
c67329d3ad
updated test for EU Open Data portal datasets
2021-02-03 17:06:48 +01:00
Claudio Atzori
0e8a4f9f1a
better logging, WIP: collectorWorker error reporting
2021-02-03 12:33:41 +01:00
Alessia Bardi
fd705404a1
tests for EU Open Data portal dataset mapping
2021-02-03 10:28:17 +01:00
Miriam Baglioni
6190465851
merge upstream
2021-02-03 10:27:27 +01:00
Claudio Atzori
53884d12c2
code formatting
2021-02-02 14:38:03 +01:00
Claudio Atzori
ac46c247d2
code formatting
2021-02-02 14:24:00 +01:00
Claudio Atzori
bde14b149a
fixed transformation target paths
2021-02-02 12:49:29 +01:00
Claudio Atzori
ca4391aa1c
minor changes
2021-02-02 12:44:04 +01:00
Claudio Atzori
bb89b99b24
code formatting
2021-02-02 12:34:14 +01:00
Claudio Atzori
75807ea5ae
factored out constants
2021-02-02 12:28:21 +01:00
Sandro La Bruzzo
0634674add
implemented transformation test
2021-02-02 12:12:14 +01:00
Claudio Atzori
8eaa1fd4b4
WIP: metadata collection in INCREMENTAL mode and relative test
2021-02-01 19:29:10 +01:00
Sandro La Bruzzo
bead34d11a
code refactor
2021-02-01 14:58:06 +01:00
Sandro La Bruzzo
6ff234d81b
Implemented a first prototype of incremental harvesting and trasformation using readlock
2021-02-01 13:56:05 +01:00
Sandro La Bruzzo
b6b835ef49
update transformation Factory to get Transformation Rule by Id and not by Title
2021-02-01 08:49:42 +01:00
Sandro La Bruzzo
e423634cb6
RollBack in case of error WORKS!!!
2021-01-29 17:21:42 +01:00
Sandro La Bruzzo
8ee82576c6
Collection on Refresh WORKS!!!
2021-01-29 17:02:46 +01:00
Sandro La Bruzzo
0276180039
WIP mdstore
...
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo
0f8e2ecce6
Merged Datacite transfrom into this branch
2021-01-29 10:45:07 +01:00
Sandro La Bruzzo
99cf3a8ea4
Merged Datacite transfrom into this branch
2021-01-28 16:34:46 +01:00
Sandro La Bruzzo
686e7b507c
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
2021-01-28 10:02:13 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo
184e7b3856
Implemented new Transformation using spark
2021-01-27 15:43:08 +01:00
Sandro La Bruzzo
150a617bd1
Merge pull request 'aggregation_on_hadoop' ( #90 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori
f1a852f278
align usage-stats workflow poms with latest snapshot version
2021-01-26 15:42:42 +01:00
Claudio Atzori
9c32119dc2
Merge pull request 'usage-stats-export-wf-v2' ( #89 ) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
...
Thank you Dimitris!
2021-01-26 15:01:41 +01:00
Claudio Atzori
885e0dd926
[Cleaning] filter authors not providing word characters in the fullname
2021-01-26 09:48:53 +01:00
Claudio Atzori
2890511613
[Cleaning] normalise missing Result.country
2021-01-26 09:41:44 +01:00
Claudio Atzori
4eb9ed35b1
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-01-25 18:12:24 +01:00
Claudio Atzori
cd379eb5e3
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
2021-01-25 18:11:49 +01:00
Alessia Bardi
505477f36f
format code
2021-01-25 18:02:49 +01:00
Alessia Bardi
ded6ed8d7d
no ',' author, if there are no author in ODF records
2021-01-25 17:57:51 +01:00
Claudio Atzori
3465c8ccee
[Cleaning] trying to avoid NPEs
2021-01-25 16:54:53 +01:00
Sandro La Bruzzo
a54848a59c
Moved Vocabulary stuff to common module
2021-01-25 15:43:04 +01:00
Sandro La Bruzzo
ffb092b8d3
removed duplicate code HttpConnector.java
2021-01-25 15:05:37 +01:00
Sandro La Bruzzo
cda210a2ca
changed documentation since it didn't reflect the current status
2021-01-25 14:17:42 +01:00
Claudio Atzori
07a0ccfc96
[Cleaning] trying to avoid NPEs
2021-01-25 13:36:01 +01:00
miconis
c7e2d5a59a
minor changes
2021-01-25 12:40:45 +01:00
Claudio Atzori
34d653de41
[Cleaning] updated cleaning rule for DOIs
2021-01-22 14:16:33 +01:00
Miriam Baglioni
fe36895c53
added datasource blacklist for the organization to result propagation through institutional repositories
2021-01-22 11:55:10 +01:00
miconis
8fea29177c
refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf
2021-01-18 16:48:08 +01:00
Dimitris
3e8d2a6b2d
Clean workflows
2021-01-15 16:19:12 +02:00
Michele Artini
cfbcdc95bc
fixed a wf param
2021-01-14 14:45:23 +01:00
Michele Artini
69ba3203c0
fixed a conflict
2021-01-14 14:43:25 +01:00
Michele Artini
b230d44411
fixed conflict
2021-01-14 14:32:31 +01:00
Michele Artini
b9d90e95b8
Added eventId to ShortEventMessage
2021-01-14 14:32:31 +01:00
Michele Artini
64b0b0bfb3
fixed a bug with invalid subject topic
2021-01-14 14:32:31 +01:00
Michele Artini
e3e0ab1de1
fixed a problem with join
2021-01-14 14:32:31 +01:00
Michele Artini
26a941315a
openaireId
2021-01-14 14:32:31 +01:00
Michele Artini
6f4d1a37f0
ES wf properties
2021-01-14 14:32:31 +01:00
Michele Artini
1391341d06
mkdir of output dir
2021-01-14 14:32:31 +01:00
Michele Artini
3c9cbd19f3
whitelist of topics
2021-01-14 14:32:31 +01:00
Michele Artini
467aa77279
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
10f3f7eca7
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
ff41a7b3a4
gzipped output
2021-01-14 14:32:31 +01:00
Claudio Atzori
80cf55ef2e
[Broker] fixed partitionEventsByOpendoarIds workflow parameter names
2021-01-13 16:24:30 +01:00
Claudio Atzori
41500669e2
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 14:39:47 +01:00
Claudio Atzori
2a7a10809e
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 10:05:02 +01:00
Claudio Atzori
d6686dd7cf
merged from master
2021-01-08 18:16:12 +01:00
Claudio Atzori
34229970e6
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-08 16:29:17 +01:00
Claudio Atzori
1361c9eb0c
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-07 10:07:30 +01:00
Claudio Atzori
ab2fe9266a
[DOIBoost] minor fixes in workflow definition
2021-01-05 10:26:39 +01:00
Claudio Atzori
7c722f3fdc
[DOIBoost] fixed typo
2021-01-05 10:25:54 +01:00
Claudio Atzori
8879704ba0
[DOIBoost] configurable ES server url and index name in crossref importer
2021-01-05 10:00:13 +01:00
Claudio Atzori
26e9d55c13
code formatting
2021-01-05 09:59:26 +01:00
Sandro La Bruzzo
7834a35768
avoid to save intermediate dataset before generation of Sequence file
2021-01-04 17:54:57 +01:00
Sandro La Bruzzo
e79445a8b4
minor fix for claudio polemica
2021-01-04 17:39:25 +01:00
Sandro La Bruzzo
8765020b85
minor fix
2021-01-04 17:37:08 +01:00
Sandro La Bruzzo
b0dc92786f
defined a single oozie workflow for the generation of doiboost
2021-01-04 17:01:35 +01:00
Claudio Atzori
7185158942
ignore missing properties
2020-12-29 11:06:28 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
60649ac7d2
swapped expected and actual in tests, updated expected number of authors
2020-12-23 12:26:04 +01:00
Claudio Atzori
723b01f9e9
trivial: the less magic numbers and values around, the better
2020-12-23 12:22:48 +01:00
Claudio Atzori
7bfc35df5e
Merge pull request 'Changed typo in script names' ( #82 ) from antonis.lempesis/dnet-hadoop:master into master
...
no need to! :)
2020-12-22 12:36:21 +01:00
Antonis Lempesis
be5969a8c2
Changed typo in script names
2020-12-22 13:33:32 +02:00
miconis
1e1aab83e3
implementation of the raw wf for openorgs: still not complete, some functionalities are missing
2020-12-21 11:58:21 +01:00
Claudio Atzori
6cb0dc3f43
extended OCRID cleaning procedure
2020-12-21 11:40:17 +01:00
Claudio Atzori
573a8a3272
Merge pull request 'Changed typo in script names' ( #81 ) from antonis.lempesis/dnet-hadoop:master into master
...
ok! LGTM
2020-12-18 17:44:26 +01:00
Antonis Lempesis
2a074c3b2b
Changed typo in script names
2020-12-18 18:40:48 +02:00
Claudio Atzori
47270d9af5
lenient mock can be lenient
2020-12-18 15:38:59 +01:00
Claudio Atzori
2e503ee101
code formatting
2020-12-17 13:47:38 +01:00
Claudio Atzori
5a3e2199b2
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #80 ) from miriam.baglioni/dnet-hadoop:bipFinder into bipFinder_master_test
2020-12-17 12:26:38 +01:00
Claudio Atzori
03319d3bd9
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master"
...
This reverts commit add7e1693b
, reversing
changes made to f9a8fd8bbd
.
2020-12-17 12:23:58 +01:00
Claudio Atzori
add7e1693b
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master
2020-12-17 12:09:03 +01:00
Alessia Bardi
f9a8fd8bbd
updated test record for textgrid
2020-12-17 11:59:45 +01:00
Claudio Atzori
4766495f5b
[orcid_to_result_from_semrel_propagation] fixed typo in SQL
2020-12-17 09:15:50 +01:00
Claudio Atzori
de00094ebc
Merge pull request 'FIX on the creation of subject based broker enrichments' ( #79 ) from broker into master
2020-12-15 14:58:31 +01:00
Michele Artini
f9dc1e45fd
fixed a bug with invalid subject topic
2020-12-15 14:54:11 +01:00
Sandro La Bruzzo
f92bd56f56
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-15 11:47:29 +01:00
Sandro La Bruzzo
1f6c8a9e83
added orcid_pending type to records coming from Crossref
2020-12-15 11:47:15 +01:00
Enrico Ottonello
b2de598c1a
all actions from download lambda file to merge updated data into one wf
2020-12-15 10:42:55 +01:00
Claudio Atzori
9f1181290e
Merge pull request 'broker' ( #78 ) from broker into master
...
The changes look good to me.
2020-12-15 10:03:45 +01:00
Michele Artini
0a0f62bd01
Merge branch 'master' into broker
2020-12-15 08:30:52 +01:00
Michele Artini
12fa5d122a
fixed a problem with join
2020-12-15 08:30:26 +01:00
Michele Artini
991e675dc6
validation in claim rels
2020-12-14 15:41:25 +01:00
Michele Artini
3e19cf7b4a
openaireId
2020-12-14 15:24:33 +01:00
Claudio Atzori
b6f08ce226
re-adding the old junit:junit dep as solr-test-framework needs it
2020-12-14 15:07:31 +01:00
Claudio Atzori
7d325e2c57
using actual result subclasses instead of their parent class
2020-12-14 14:40:54 +01:00
Claudio Atzori
152916890f
renamed test name
2020-12-14 14:40:05 +01:00
Michele Artini
a203aee32a
ES wf properties
2020-12-14 12:02:33 +01:00
Claudio Atzori
1506f49052
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
2020-12-14 11:14:03 +01:00
Michele Artini
d03756c962
mkdir of output dir
2020-12-14 11:11:41 +01:00
Michele Artini
399548f221
whitelist of topics
2020-12-14 11:03:55 +01:00
Michele Artini
38da1c282a
Merge branch 'master' into broker
2020-12-14 09:14:02 +01:00
Dimitris
dc9c2f3272
Commit 12122020
2020-12-12 12:00:14 +02:00
Enrico Ottonello
efe4c2a9c5
authors and works are now updated in two separate spark actions of the wf
2020-12-12 02:06:21 +01:00
Enrico Ottonello
858efbfad1
fix dataset creation for downloaded works
2020-12-11 16:49:54 +01:00
Claudio Atzori
61cd129ded
XML serialisation test
2020-12-11 12:44:53 +01:00
Claudio Atzori
ce7a319e01
using the correct assertion import
2020-12-11 12:44:17 +01:00
Claudio Atzori
7fe2433137
excluded transitive older junit dependencies, they can compromise the unit test executions
2020-12-11 12:42:55 +01:00
Claudio Atzori
d9532446eb
imported more diffs from master branch; code formatting
2020-12-10 16:14:16 +01:00
Claudio Atzori
1eaad89a3c
do not fail on uknown properties when grouping entities by ID
2020-12-10 15:56:11 +01:00
Michele Artini
933b4c1ada
workingDir and outputDir
2020-12-10 14:47:51 +01:00
Michele Artini
2e7df07328
workingDir and outputDir
2020-12-10 14:47:22 +01:00
Michele Artini
94bfed1c84
gzipped output
2020-12-10 11:59:28 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Miriam Baglioni
b7adbc7c3e
merge branch with master
2020-12-10 10:35:27 +01:00
Alessia Bardi
112da6d76a
in theory, just auto-formatting after mvn compile
2020-12-09 20:00:27 +01:00
Alessia Bardi
bece04b330
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-09 19:54:43 +01:00
Alessia Bardi
426b76ee8e
more asserts for TextGrid record
2020-12-09 19:46:11 +01:00
Claudio Atzori
ff72fcd91a
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 19:04:50 +01:00
Claudio Atzori
4705144918
Merge pull request 'rel_project_validation' ( #69 ) from rel_project_validation into master
...
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori
211aa04726
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 18:08:51 +01:00
Claudio Atzori
ada21ad920
Merge pull request 'dump of the results related to at least one project' ( #61 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-12-09 17:22:56 +01:00
Claudio Atzori
3c5ce1dada
code formatting
2020-12-09 17:07:20 +01:00
Michele Artini
1bc9adc10d
default trust for validated rels
2020-12-09 16:18:37 +01:00
Claudio Atzori
fcd7689b50
promote actions: shouldGroupById parameter marked as optional (default is true)
2020-12-09 13:10:16 +01:00
Michele Artini
5f21a356fd
reindent
2020-12-09 11:24:30 +01:00
Michele Artini
370a5e650b
validation attributes in resultProject relations
2020-12-09 11:18:26 +01:00
Antonis Lempesis
aead9efd24
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
77a3a6d82e
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
91226117b3
ignoring deletedbyinference relations
2020-12-09 10:45:24 +01:00
Antonis Lempesis
b7f29db126
finished first implementation of wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
ded2392275
initial implementation of the promote wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
1a87a1effd
added last step to update cache
2020-12-09 10:45:24 +01:00
Enrico Ottonello
2233750a37
original orcid xml data are stored in a field of the class that models orcid data
2020-12-09 09:45:19 +01:00
Claudio Atzori
27e96767e0
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 21:53:22 +01:00
Claudio Atzori
fba11eef2a
cleanup
2020-12-07 21:53:13 +01:00
Sandro La Bruzzo
7f8b93de72
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 19:59:39 +01:00
Sandro La Bruzzo
302baab67b
fixed doiboost mapping and workflows
2020-12-07 19:59:33 +01:00
Enrico Ottonello
5c65e602d3
wf doi_authors generates one json data foreach row
2020-12-07 15:28:10 +01:00
Michele Artini
d6934f370e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 14:56:23 +01:00
Michele Artini
5de8a7276f
wf to partition opendoar events
2020-12-07 14:56:06 +01:00
Claudio Atzori
5e8509bef7
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 13:50:08 +01:00
Claudio Atzori
026ad40633
disabled test
2020-12-07 13:50:01 +01:00
Claudio Atzori
21ddcf3a73
actions promotion can optionally avoid grouping objects by id (configured via shouldGroupById parameter)
2020-12-07 13:45:18 +01:00
Enrico Ottonello
fa1855a4b8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-07 11:02:59 +01:00
Enrico Ottonello
b1b589ada1
wf to generate orcid dataset
2020-12-07 11:02:32 +01:00
Sandro La Bruzzo
620e585b63
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 10:42:53 +01:00
Sandro La Bruzzo
b31dd126fb
fixed crossref workflow added common ORCID Class
2020-12-07 10:42:38 +01:00
Enrico Ottonello
8812ab65e1
completed download function to wf; added accumulators
2020-12-04 21:13:49 +01:00
Claudio Atzori
a104a632df
cleanup
2020-12-04 16:32:47 +01:00
Claudio Atzori
5b4e1142a8
Merge pull request 'added last step to update cache' ( #64 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, thanks!
2020-12-04 14:42:31 +01:00
Antonis Lempesis
b1ed1afdcc
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:07:18 +02:00
Antonis Lempesis
7cb113e088
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:04:25 +02:00
Antonis Lempesis
d23ccae0d5
ignoring deletedbyinference relations
2020-12-04 12:42:17 +02:00
Miriam Baglioni
5fb65ffc4a
merge branch with master
2020-12-03 11:24:35 +01:00
Miriam Baglioni
ea88dc3401
fixed issue in property name
2020-12-03 11:24:23 +01:00
Miriam Baglioni
4c58bd1c93
merge with upstream
2020-12-03 11:24:00 +01:00
Miriam Baglioni
05c452f58d
merge with upstream
2020-12-03 10:26:45 +01:00
Enrico Ottonello
53b22c1937
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-02 23:21:27 +01:00
Enrico Ottonello
1b1e9ea67c
wf to generate doi_author_list for doiboost; wf to download updated works
2020-12-02 23:20:16 +01:00
Antonis Lempesis
413afcfed5
finished first implementation of wf
2020-12-02 15:57:17 +02:00
Antonis Lempesis
0948536614
initial implementation of the promote wf
2020-12-02 15:41:56 +02:00
Sandro La Bruzzo
7da679542f
fixed wrong projectId
2020-12-02 14:28:09 +01:00
Sandro La Bruzzo
6ba8037cc7
fixed failure to test due to changing of input
2020-12-02 11:34:46 +01:00
Claudio Atzori
cfb55effd9
code formatting
2020-12-02 11:23:49 +01:00
Claudio Atzori
74242e450e
using constants from ModelConstants
2020-12-02 11:23:35 +01:00
Miriam Baglioni
d5efa6963a
using constants in ModelCOnstants
2020-12-02 11:20:26 +01:00
Miriam Baglioni
cd285e98bc
usoing the constants defined in the ModelConstants class
2020-12-02 11:13:23 +01:00
Miriam Baglioni
4b0d1530a2
merge upstream
2020-12-02 11:05:00 +01:00
Claudio Atzori
faa977df7e
Merge pull request 'orcid-no-doi' ( #43 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
...
The dataset was generated and is now part of the actionsets available in BETA
2020-12-02 10:55:12 +01:00
Claudio Atzori
57f448b7a4
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
2020-12-02 10:44:05 +01:00
Alessia Bardi
2d15667b4a
testing XML generation from json object (case AMS ACTA)
2020-12-02 10:16:26 +01:00
Alessia Bardi
a417624670
tests for raw graph mapping
2020-12-02 10:15:26 +01:00
Claudio Atzori
893ac4a77b
GenerateEntitiesApplication can be configured to hash the id value or not
2020-12-02 09:30:06 +01:00
Miriam Baglioni
f8468c9c22
added extention for new author pid (orcid_pending)
2020-12-01 20:09:35 +01:00
Miriam Baglioni
888175baf7
added java doc
2020-12-01 18:36:29 +01:00
Miriam Baglioni
3d62d99d5d
fixed issue in workflow variable
2020-12-01 15:02:49 +01:00
Miriam Baglioni
17680296b9
removed unnecessary variable and unused method
2020-12-01 15:02:31 +01:00
Miriam Baglioni
5b3ed70808
refactoring
2020-12-01 14:31:34 +01:00
Miriam Baglioni
62ff4999e3
added workflow and last step of collection and save
2020-12-01 14:30:56 +01:00
Miriam Baglioni
45d06c45c7
collecting all the atoic actions for result type and save them all in the AS path
2020-12-01 14:29:18 +01:00
Miriam Baglioni
0051ebede5
extending test
2020-12-01 12:43:03 +01:00
Miriam Baglioni
719da15f04
added test resources
2020-12-01 12:42:30 +01:00
Miriam Baglioni
db36e11912
classes test classes and resources for production of the actionset to include bipFinder score in results
2020-11-30 20:14:23 +01:00
Enrico Ottonello
f2df3ead74
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-30 14:22:46 +01:00
Enrico Ottonello
40c4559e92
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
2020-11-30 14:19:22 +01:00
Claudio Atzori
2c407e775e
GenerateEntitiesApplication can be configured to hash the id value or not
2020-11-30 12:00:38 +01:00
Antonis Lempesis
815d6b25d9
added last step to update cache
2020-11-30 00:48:10 +02:00
Claudio Atzori
758d27745d
cleaning tab characters from text fields
2020-11-27 16:07:24 +01:00
Claudio Atzori
e731a7658d
cleaning texts to remove tab characters too
2020-11-27 09:00:04 +01:00
Claudio Atzori
5151850a19
CROSSREF and DATACITE constants moved in common ModelConstants
2020-11-26 13:08:36 +01:00
Claudio Atzori
a104d2b6ad
cleanup
2020-11-26 11:12:00 +01:00
Claudio Atzori
d0d5525d40
minor changes
2020-11-26 11:04:17 +01:00
Claudio Atzori
13eae4b31e
GroupEntitiesSparkJob must read all graph paths but relations
2020-11-26 11:04:01 +01:00
Claudio Atzori
76363a8512
SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates
2020-11-26 11:03:12 +01:00
Claudio Atzori
c1b9a4045a
grouping of records will be performed by the dedup workflow
2020-11-26 10:59:10 +01:00
Miriam Baglioni
124591a7f3
refactoring
2020-11-25 18:23:28 +01:00
Miriam Baglioni
1a89f8211c
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:12:40 +01:00
Miriam Baglioni
5fbe54ef54
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:10:28 +01:00
Miriam Baglioni
ed01e5a5e1
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:09:34 +01:00
Miriam Baglioni
d4ddde2ef2
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:01:01 +01:00
Miriam Baglioni
f5e5e92a10
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:58:53 +01:00
Miriam Baglioni
1df94b85b4
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:57:43 +01:00
Claudio Atzori
db0181b8af
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' ( #60 ) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
2020-11-25 17:17:40 +01:00
Sandro La Bruzzo
ec3e238de6
Fixed problem on duplicated identifier
2020-11-25 17:15:54 +01:00
Claudio Atzori
e208b03755
renamed workflow
2020-11-25 14:55:50 +01:00
Claudio Atzori
dfd6205b95
Consistency graph workflow merges all the entities by ID
2020-11-25 14:55:32 +01:00
Miriam Baglioni
90d4369fd2
added test to verify the compression in writing community info on hdfs
2020-11-25 14:34:58 +01:00
Miriam Baglioni
6750e33d69
merge branch with master
2020-11-25 14:09:01 +01:00
Miriam Baglioni
b2c455f883
added java doc
2020-11-25 14:08:09 +01:00
Miriam Baglioni
1f130cdf92
changed the relation (produces -> isProducedBy) due to the change in the code
2020-11-25 14:04:26 +01:00
Miriam Baglioni
e758d5d9b4
refactoring
2020-11-25 13:46:39 +01:00
Miriam Baglioni
87a9f616ae
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
2020-11-25 13:45:41 +01:00
Miriam Baglioni
e7e418e444
added decision node to verify if to upload in Zenodo
2020-11-25 13:44:10 +01:00
Miriam Baglioni
305e3d0c9c
added resource file for relation with relClass = isProducedBy
2020-11-25 13:43:41 +01:00
Miriam Baglioni
21ce175d17
added FilterFunction specification if filter operation
2020-11-25 13:42:31 +01:00
Miriam Baglioni
bde6d337dd
test classes for dump of results related to funders
2020-11-25 13:42:01 +01:00
Miriam Baglioni
b37b9352d7
added constant value for semantic relationship between projects and results
2020-11-25 13:41:08 +01:00
Sandro La Bruzzo
264723ffd8
updated stuff for zenodo upload
2020-11-25 11:56:07 +01:00
Claudio Atzori
36173c13a5
reverted filters in the clening process
2020-11-25 10:24:42 +01:00
Claudio Atzori
eeebd5a920
Cleanig workflow: remove newlines from titles, descriptions, subjects
2020-11-24 18:40:25 +01:00
Claudio Atzori
e1a1bb3ee4
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
2020-11-24 18:34:03 +01:00
Enrico Ottonello
99a086f0c6
max concurrent executors set to 10, according to ORCID Director of Technology mail request
2020-11-24 17:49:32 +01:00
Miriam Baglioni
72bb0fe360
changed directory name
2020-11-24 16:47:07 +01:00
Miriam Baglioni
00874a8ce6
added bidirectionality to relations from project and result
2020-11-24 15:17:23 +01:00
Miriam Baglioni
39f4a20873
chenged the path and the name for saving the communities_infrastructures dump file
2020-11-24 14:47:32 +01:00
Miriam Baglioni
7e14452a87
final versione of the wf to get the dump of results associated to at least one funder per funder
2020-11-24 14:46:34 +01:00
Miriam Baglioni
c167a18057
added new parameter for the dumpType
2020-11-24 14:45:50 +01:00
Miriam Baglioni
54a309bb6b
refactoring
2020-11-24 14:45:30 +01:00
Miriam Baglioni
35ecea8842
changed to consider the modification for the specification of the type of dump
2020-11-24 14:45:15 +01:00
Miriam Baglioni
b9b6bdb2e6
fixing issue on previous implementation
2020-11-24 14:44:53 +01:00
Miriam Baglioni
7e940f1991
changed to consider the modification for the specification of the type of dump
2020-11-24 14:43:34 +01:00
Miriam Baglioni
62928ef7a5
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
2020-11-24 14:42:41 +01:00
Claudio Atzori
33bae02451
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
2020-11-24 14:42:33 +01:00
Miriam Baglioni
3319440c53
changed the direction of the relation between projects and result considered to select the results linked to projects
2020-11-24 14:41:09 +01:00
Miriam Baglioni
00c377dac2
added specification of MapFunction types in map
2020-11-24 14:40:22 +01:00
Miriam Baglioni
44db258dc4
added enumerated for the dump type
2020-11-24 14:38:06 +01:00
Miriam Baglioni
1832708c42
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
2020-11-24 14:37:36 +01:00
Enrico Ottonello
5c17e768b2
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
2020-11-23 16:01:23 +01:00
Enrico Ottonello
5c9a727895
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-23 09:49:53 +01:00
Enrico Ottonello
97c8111847
action to convert lambda file in seq file; spark action to download updated authors
2020-11-23 09:49:22 +01:00
Miriam Baglioni
259c67ce36
fixed issue in path name
2020-11-20 12:32:23 +01:00
Miriam Baglioni
0a9db67eec
-
2020-11-20 12:21:33 +01:00
Miriam Baglioni
d362f2637d
merge branch with master
2020-11-19 19:17:20 +01:00
Miriam Baglioni
cf3f47563f
new parameter files
2020-11-19 19:16:05 +01:00
Miriam Baglioni
24c56fa7a3
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
2020-11-19 19:15:39 +01:00
Claudio Atzori
d48f388fb2
Merge branch 'provision_indexing'
2020-11-19 15:59:55 +01:00
Claudio Atzori
46bde9c13f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-19 15:26:27 +01:00
Claudio Atzori
7c9feaf9e7
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
2020-11-19 15:26:20 +01:00
Claudio Atzori
fcbb05eb21
cleanup
2020-11-19 15:14:33 +01:00
Claudio Atzori
3f34757c63
merged from master
2020-11-19 14:34:54 +01:00
Michele Artini
293da47ad9
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-19 10:42:31 +01:00
Michele Artini
ab08d12c46
considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT
2020-11-19 10:42:10 +01:00
Claudio Atzori
e503271abe
fixed notification workflow name
2020-11-19 10:41:38 +01:00
Claudio Atzori
0374d34c3e
introduced configuration param outputFormat: HDFS | SOLR
2020-11-19 10:34:28 +01:00
Miriam Baglioni
fafb688887
-
2020-11-18 18:56:48 +01:00
Miriam Baglioni
906db690d2
-
2020-11-18 17:43:08 +01:00
Claudio Atzori
ede7fae6c8
Merge pull request 'XML record indexing test' ( #58 ) from provision_indexing into master
2020-11-18 17:04:34 +01:00
Miriam Baglioni
5402062ff5
changed parameter file with the ono associated to the job
2020-11-18 16:58:20 +01:00
Miriam Baglioni
a172a37ad1
fixed typo
2020-11-18 16:55:07 +01:00
Miriam Baglioni
46ba3793f6
code, workflow and parameters for the dump of the results associated to funders
2020-11-18 16:47:31 +01:00
Claudio Atzori
5218718e8b
updated set of fields from the MDFormatDSResourceType on PROD
2020-11-18 15:00:41 +01:00
Claudio Atzori
d9e07a242b
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
2020-11-18 14:34:55 +01:00
Claudio Atzori
29dcff0f34
spark complains about missing classes, so here they are again
2020-11-18 14:32:32 +01:00
Miriam Baglioni
57cac36898
changed the workflow name
2020-11-18 13:38:03 +01:00
Claudio Atzori
12acf25519
Merge pull request 'starting from first step...' ( #57 ) from antonis.lempesis/dnet-hadoop:master into master
...
No judging. Just re-deploying...
2020-11-18 11:01:49 +01:00
Claudio Atzori
8177ce7939
test for XmlIndexingJob based on a local miniSolrCluster
2020-11-18 10:58:05 +01:00
Alessia Bardi
10e673660f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-18 10:01:23 +01:00
Alessia Bardi
be7b310cef
rel semantcis ignore case
2020-11-18 10:01:20 +01:00
Michele Artini
33da2e3d6c
xpaths for dateOfCollection and dateOfTransformation
2020-11-18 09:26:20 +01:00
Antonis Lempesis
01a6e03989
starting from first step...
2020-11-17 23:26:47 +02:00
Alessia Bardi
8f87020a50
#56 : map relevantDates from aggregated ODF records
2020-11-17 18:42:09 +01:00
Alessia Bardi
7e0a76a8ac
test fr TextGrid
2020-11-17 18:39:25 +01:00
Enrico Ottonello
2b0c9bbb7e
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-17 18:24:34 +01:00
Enrico Ottonello
c0c2e05eae
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
2020-11-17 18:23:12 +01:00
Claudio Atzori
cfc01f136e
PID filtering based on a blacklist
2020-11-17 12:27:06 +01:00
Dimitris
bbcf6b7c8b
Commit 17112020
2020-11-17 08:36:51 +02:00
Enrico Ottonello
c796adae24
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-16 11:57:19 +01:00
Claudio Atzori
6ab1ce53c9
fixed condition in result pid cleaning; cleanup
2020-11-16 10:09:17 +01:00
Claudio Atzori
4de8c8b237
fixed workflow variable name
2020-11-16 10:03:11 +01:00
Dimitris
3e24c9b176
Changes 14112020
2020-11-14 18:42:07 +02:00
Claudio Atzori
331d621800
added test resource
2020-11-14 12:16:15 +01:00
Claudio Atzori
5d4e34e26a
fixed typo in variable name
2020-11-14 10:32:26 +01:00
Claudio Atzori
768bc5304c
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-13 15:40:34 +01:00
Claudio Atzori
93f7b7974f
Merge pull request 'trust truncated to 3 decimals' ( #24 ) from trunc_trust into master
...
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori
528231a287
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
2020-11-13 15:37:48 +01:00
Enrico Ottonello
005f849674
added compression to output dataset
2020-11-13 12:45:31 +01:00
Enrico Ottonello
9a2fa9dc2f
added test for other names parsing from summaries dump
2020-11-13 10:25:34 +01:00
Claudio Atzori
2bed29eb09
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:12 +01:00
Claudio Atzori
13e36a4da0
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:02 +01:00
Enrico Ottonello
13f28fa225
moved AuthorData to dhp-schemas; added other names to author data
2020-11-12 17:43:32 +01:00
Enrico Ottonello
2af21150c5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-12 09:58:33 +01:00
Claudio Atzori
9b0fb9e958
merged from master
2020-11-12 09:27:12 +01:00
Claudio Atzori
75324ae58a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-12 09:23:37 +01:00
Claudio Atzori
822971f54f
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
2020-11-12 09:22:59 +01:00
Enrico Ottonello
1f861f2b0d
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
2020-11-11 17:38:50 +01:00
Claudio Atzori
9841488482
Merge pull request 'latest changes in stats wf' ( #54 ) from antonis.lempesis/dnet-hadoop:master into master
...
LGTM, thanks!
2020-11-11 16:01:51 +01:00
Antonis Lempesis
99ebaee347
fixed #5913
2020-11-11 16:56:46 +02:00
Claudio Atzori
e3d3481fb9
Merge pull request 'organizations pids' ( #53 ) from organization_pids into master
...
LGTM
2020-11-11 14:08:25 +01:00
Antonis Lempesis
f14e65f6a3
reverted wrong change
2020-11-10 17:23:04 +02:00
Antonis Lempesis
c02c7741c9
fixes in db creation
2020-11-10 17:11:30 +02:00
Antonis Lempesis
e603fa5847
fixes in db creation
2020-11-10 17:11:12 +02:00
Enrico Ottonello
fea2451658
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-10 11:49:43 +01:00
Claudio Atzori
18d9aad70c
improved documentation in dhp-graph-provision
2020-11-10 11:48:55 +01:00
Enrico Ottonello
1513174d7e
added further test case
2020-11-10 11:44:55 +01:00
Michele Artini
40160d171f
organizations pids
2020-11-09 12:58:36 +01:00
Sandro La Bruzzo
8e1d43aab2
Implemented ID generation using IdentifierRecordFactory on DOIBoost
2020-11-09 11:53:55 +01:00
Sandro La Bruzzo
027ef2326c
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-06 17:12:42 +01:00
Sandro La Bruzzo
cd27df91a1
fixed bug on missing relation in ANDS
2020-11-06 17:12:31 +01:00
Enrico Ottonello
6bc7dbeca7
first version of dataset successful generated from orcid dump 2020
2020-11-06 13:47:50 +01:00
Claudio Atzori
d10447e747
re-packaged graph dump workflow sources
2020-11-05 17:38:18 +01:00
Claudio Atzori
2d76497488
cleanup
2020-11-05 17:10:24 +01:00
Miriam Baglioni
f8e9bda24c
merge branch with master
2020-11-05 16:31:18 +01:00
Miriam Baglioni
be5ed8f554
added check to avoid sending empty metadata.
2020-11-05 16:10:17 +01:00
Claudio Atzori
2148a51fae
minor changes
2020-11-05 11:24:12 +01:00
Claudio Atzori
4625b7486e
code formatting
2020-11-04 18:12:43 +01:00
Claudio Atzori
f5f346dd2b
Merge pull request 'dump' ( #50 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni
e9ac471ae9
removed dependency from classes for the pid graph dump
2020-11-04 18:04:42 +01:00
Miriam Baglioni
b90a945c49
removed property files for pid graph dump
2020-11-04 17:28:33 +01:00
Miriam Baglioni
bac307155a
removed properties specific for pid graph dump
2020-11-04 17:28:04 +01:00
Miriam Baglioni
9c9d50f486
removed code specific for pid graph dump
2020-11-04 17:26:22 +01:00
Miriam Baglioni
5669890934
removed commented lines
2020-11-04 17:15:21 +01:00
Miriam Baglioni
6a89f59be9
removed commented lines
2020-11-04 17:13:59 +01:00
Miriam Baglioni
56150d7e5e
removed all code related to the dump of pids graph
2020-11-04 17:13:12 +01:00
Miriam Baglioni
16c54a96f8
removed pid dump
2020-11-04 17:11:32 +01:00
Claudio Atzori
e5da4ee9b1
dedup workflow using the common PidComparator
2020-11-04 15:02:02 +01:00
Miriam Baglioni
0cac5436ff
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-11-04 13:21:11 +01:00
Alessia Bardi
51808b5afd
Updated descriptions
2020-11-04 12:29:48 +01:00
Alessia Bardi
e6becf8659
Updated descriptions
2020-11-04 12:17:57 +01:00
Alessia Bardi
0abe0eee33
Updated descriptions
2020-11-04 12:15:30 +01:00
Alessia Bardi
f6ab238f5d
Updated descriptions
2020-11-04 11:50:47 +01:00
Sandro La Bruzzo
3581244daf
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-04 09:04:22 +01:00
Sandro La Bruzzo
66efb39634
implemented merge scholix
2020-11-04 09:04:01 +01:00
Miriam Baglioni
c010a8442f
fixed issue on test code
2020-11-03 17:26:51 +01:00
Miriam Baglioni
8ec7a61188
merge branch with master
2020-11-03 16:59:08 +01:00
Miriam Baglioni
c209284ca7
new schemas for the entities in the dump with added descriptions
2020-11-03 16:58:08 +01:00
Miriam Baglioni
08806deddf
added the splitSize non mandatory parameter. Default size 10G
2020-11-03 16:57:34 +01:00
Miriam Baglioni
7d2eda43ca
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
2020-11-03 16:57:01 +01:00
Miriam Baglioni
cbbb1bdc54
moved business logic to new class in common for handling the zip of hte archives
2020-11-03 16:55:50 +01:00
Miriam Baglioni
d4382b54df
moved the tar archive with maz size on common module
2020-11-03 16:54:50 +01:00
Claudio Atzori
86d6fbe95b
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
2020-11-03 12:19:46 +01:00
Claudio Atzori
8471888ad3
Merge branch 'graph_cleaning' into stable_ids
2020-11-03 11:52:47 +01:00
Claudio Atzori
5310e56dba
remove empy PIDs
2020-11-03 11:52:10 +01:00
Claudio Atzori
3fcd669e99
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
2020-11-03 10:53:23 +01:00
Claudio Atzori
8e7f81c5f5
code formatting
2020-11-02 14:25:00 +01:00
Claudio Atzori
09e44dabff
Merge branch 'master' into stable_ids
2020-11-02 12:16:01 +01:00
Sandro La Bruzzo
754c86f33e
fixed test to work on jenkins
2020-11-02 09:35:01 +01:00
Sandro La Bruzzo
39337d8a8a
fixed test
2020-11-02 09:26:25 +01:00
Dimitris
32bf943979
Changes to download only updates
2020-11-02 09:08:25 +02:00
Miriam Baglioni
dabb33e018
changed the discriminant for which split the file
2020-10-30 17:52:22 +01:00