Miriam Baglioni
|
7a1b440413
|
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
|
2021-12-23 13:24:28 +01:00 |
Miriam Baglioni
|
34ac56565d
|
refactoring
|
2021-12-22 16:28:11 +01:00 |
Miriam Baglioni
|
813f856d3f
|
[BipFinder] removing left over parameter in wf
|
2021-12-22 16:11:12 +01:00 |
Miriam Baglioni
|
e24a7f3496
|
mergin with branch beta
|
2021-12-21 13:57:19 +01:00 |
Sandro La Bruzzo
|
3920d68992
|
Fixed workflow generation of delta in datacite
|
2021-12-21 11:41:49 +01:00 |
Miriam Baglioni
|
22d4b5619b
|
[BipFinder Result] last changes to test and resources files
|
2021-12-14 14:54:13 +01:00 |
Miriam Baglioni
|
6fb6236cd4
|
changed the way to produce the AS for bipFinder.
|
2021-12-14 14:51:14 +01:00 |
Miriam Baglioni
|
4eb8276493
|
-
|
2021-12-14 11:12:17 +01:00 |
Sandro La Bruzzo
|
2164a2a889
|
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
Commented a little the Datacite transformation code
|
2021-11-25 10:54:13 +01:00 |
Miriam Baglioni
|
4ec88c718c
|
merge with beta - resolved conflict in pom
|
2021-11-15 10:52:16 +01:00 |
Miriam Baglioni
|
716021546e
|
[Bypass Action Set] minor fix
|
2021-11-12 10:18:01 +01:00 |
Miriam Baglioni
|
935062edec
|
[Bypass Action Set] creation of unresolved entities
|
2021-11-11 16:11:25 +01:00 |
Sandro La Bruzzo
|
034304b33a
|
conflict resolved on merge
|
2021-10-26 09:40:47 +02:00 |
Sandro La Bruzzo
|
aeeebd573b
|
code refactor renamed datacite package
|
2021-10-20 17:37:42 +02:00 |
Sandro La Bruzzo
|
ab3a99d3e9
|
removed old datacite oozie workflow
|
2021-10-20 17:19:47 +02:00 |
Sandro La Bruzzo
|
ae4e99a471
|
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
|
2021-10-20 17:12:16 +02:00 |
Sandro La Bruzzo
|
7b15b88d4c
|
renamed wrong package, implemented last aggregation workflow for scholexplorer
|
2021-10-15 15:00:15 +02:00 |
Sandro La Bruzzo
|
51a03c0a50
|
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
|
2021-10-14 14:23:13 +02:00 |
Sandro La Bruzzo
|
7387416e90
|
added params skip update to direct transform in OAF, this should be set to true in production
|
2021-10-12 12:36:30 +02:00 |
Sandro La Bruzzo
|
511da98d0c
|
- fixed bug on download pmc Article
- removed unused line of code in SparkCreateActionset
|
2021-10-12 11:47:49 +02:00 |
Sandro La Bruzzo
|
5606014b17
|
code refactor see ticket #7065
|
2021-10-12 08:11:53 +02:00 |
Miriam Baglioni
|
5ec69889db
|
OpenCitations: creation of AS from OC
|
2021-09-27 16:02:06 +02:00 |
Miriam Baglioni
|
f2118d771a
|
first steps in the implementation of the integration of opencitations
|
2021-09-22 15:18:05 +02:00 |
Sandro La Bruzzo
|
9f8a80deb7
|
fixed wrong import of unresolved relation in openaire
|
2021-09-01 14:16:27 +02:00 |
Miriam Baglioni
|
ab8abd61bb
|
GetCSV refactoring - refactoring due to movement of classes
|
2021-08-12 18:11:07 +02:00 |
Miriam Baglioni
|
1d6ac3715b
|
merge branch with beta
|
2021-07-30 11:58:29 +02:00 |
Sandro La Bruzzo
|
b1b0cc3f15
|
fixed wrong package name
|
2021-07-29 13:55:08 +02:00 |
Sandro La Bruzzo
|
3721df7aa6
|
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
|
2021-07-29 10:45:35 +02:00 |
Miriam Baglioni
|
708d0ade34
|
Merge branch 'beta' into hostedbymap
|
2021-07-28 10:37:22 +02:00 |
Sandro La Bruzzo
|
825d9f0289
|
fixed datacite workflow starting from Importing delta
|
2021-07-27 16:09:46 +02:00 |
Miriam Baglioni
|
63553a76b3
|
added code to download gold issn list from unibi
|
2021-07-22 12:01:48 +02:00 |
Sandro La Bruzzo
|
cd17e19044
|
implemented branch workflow to import datacite and crossref in scholexplorer
|
2021-07-08 21:20:19 +02:00 |
Sandro La Bruzzo
|
0cdb7ccdaa
|
added inverse relations to datacite mapping
|
2021-06-04 15:10:20 +02:00 |
Sandro La Bruzzo
|
02ef46535f
|
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
|
2021-05-31 09:50:15 +02:00 |
Sandro La Bruzzo
|
aeadc5a366
|
updated wf Datacite Import to retrieve the block size as parameter
|
2021-05-31 09:49:53 +02:00 |
Claudio Atzori
|
d512062b58
|
integrating pull #109, H2020Classification
|
2021-05-27 12:22:47 +02:00 |
Sandro La Bruzzo
|
bced804151
|
updated wf Datacite Import to retrieve the block size as parameter
|
2021-05-26 17:06:50 +02:00 |
Miriam Baglioni
|
c844877de2
|
changed workflow flow to possibly parallelize also the programme and project preparation steps
|
2021-05-21 14:41:57 +02:00 |
Miriam Baglioni
|
54f6e2f693
|
changed to get the needed information to build the action set as parallel jobs
|
2021-05-21 11:47:00 +02:00 |
Miriam Baglioni
|
9610224671
|
added param to workflow property
|
2021-05-20 18:21:12 +02:00 |
Claudio Atzori
|
b695932ae4
|
integrated pull#108
|
2021-05-20 15:34:04 +02:00 |
Miriam Baglioni
|
dc0ad8d2e0
|
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
|
2021-05-20 14:53:53 +02:00 |
Claudio Atzori
|
239d0f0a9a
|
ROR actionset import workflow backported from branch stable_ids
|
2021-05-18 16:12:11 +02:00 |
Michele Artini
|
a278d67175
|
parse input file
|
2021-04-29 11:34:47 +02:00 |
Michele Artini
|
b5cf505cc6
|
partial implementation of the ROR->actionset workflow
|
2021-04-28 16:00:24 +02:00 |
Sandro La Bruzzo
|
fd29307b84
|
updated workflow name
|
2021-04-21 09:21:41 +02:00 |
Sandro La Bruzzo
|
e06c7f32f6
|
updated id figshare as described in #6377
|
2021-04-20 10:18:07 +02:00 |
Sandro La Bruzzo
|
cdfe01bbae
|
improved parallelization on transformation job
|
2021-04-19 15:14:52 +02:00 |
Sandro La Bruzzo
|
616d2ecce2
|
splitted workflow collecting datacite into two workflows.
Released on beta
|
2021-03-31 15:45:58 +02:00 |
Sandro La Bruzzo
|
1dfda3624e
|
improved workflow importing datacite
|
2021-03-26 13:56:29 +01:00 |
Claudio Atzori
|
58467aaf1e
|
WIP: transformation workflow error reporting
|
2021-02-17 16:14:41 +01:00 |
Claudio Atzori
|
1abe6d1ad7
|
WIP: collectorWorker error reporting, added report messages
|
2021-02-15 15:08:59 +01:00 |
Claudio Atzori
|
29c6f7e255
|
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
|
2021-02-12 12:31:02 +01:00 |
Claudio Atzori
|
bae029f828
|
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
|
2021-02-08 18:07:23 +01:00 |
Claudio Atzori
|
50add4c61b
|
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
|
2021-02-08 12:19:38 +01:00 |
Claudio Atzori
|
a8a758925e
|
better logging, WIP: collectorWorker error reporting
|
2021-02-05 19:18:05 +01:00 |
Sandro La Bruzzo
|
4dae5e605d
|
implemented messaging btween collection worker and Dnet
|
2021-02-04 15:51:15 +01:00 |
Claudio Atzori
|
e04045089f
|
better logging, WIP: collectorWorker error reporting
|
2021-02-03 17:58:22 +01:00 |
Claudio Atzori
|
53884d12c2
|
code formatting
|
2021-02-02 14:38:03 +01:00 |
Sandro La Bruzzo
|
0634674add
|
implemented transformation test
|
2021-02-02 12:12:14 +01:00 |
Sandro La Bruzzo
|
6ff234d81b
|
Implemented a first prototype of incremental harvesting and trasformation using readlock
|
2021-02-01 13:56:05 +01:00 |
Sandro La Bruzzo
|
e423634cb6
|
RollBack in case of error WORKS!!!
|
2021-01-29 17:21:42 +01:00 |
Sandro La Bruzzo
|
0276180039
|
WIP mdstore
transaction implemented on hadoop side
|
2021-01-29 16:42:41 +01:00 |
Sandro La Bruzzo
|
0f8e2ecce6
|
Merged Datacite transfrom into this branch
|
2021-01-29 10:45:07 +01:00 |
Sandro La Bruzzo
|
99cf3a8ea4
|
Merged Datacite transfrom into this branch
|
2021-01-28 16:34:46 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
184e7b3856
|
Implemented new Transformation using spark
|
2021-01-27 15:43:08 +01:00 |
Claudio Atzori
|
41500669e2
|
[BIP! Scores integration] merged missing classes from bipFinder branch
|
2021-01-11 14:39:47 +01:00 |
Claudio Atzori
|
03319d3bd9
|
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b , reversing
changes made to f9a8fd8bbd .
|
2020-12-17 12:23:58 +01:00 |
Miriam Baglioni
|
3d62d99d5d
|
fixed issue in workflow variable
|
2020-12-01 15:02:49 +01:00 |
Miriam Baglioni
|
62ff4999e3
|
added workflow and last step of collection and save
|
2020-12-01 14:30:56 +01:00 |
Miriam Baglioni
|
45d06c45c7
|
collecting all the atoic actions for result type and save them all in the AS path
|
2020-12-01 14:29:18 +01:00 |
Miriam Baglioni
|
db36e11912
|
classes test classes and resources for production of the actionset to include bipFinder score in results
|
2020-11-30 20:14:23 +01:00 |
Miriam Baglioni
|
43cbd62c2b
|
added classpath.first in the configuration
|
2020-10-01 15:46:34 +02:00 |
Miriam Baglioni
|
cd69c6b023
|
added dependency for the topic file path
|
2020-10-01 15:45:59 +02:00 |
Miriam Baglioni
|
0bf2d0db52
|
added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem
|
2020-09-28 12:17:22 +02:00 |
Miriam Baglioni
|
782984d8e5
|
added needed parameter
|
2020-05-28 23:52:41 +02:00 |
Miriam Baglioni
|
773735f870
|
added the path to the file containing the projects code from the db
|
2020-05-28 17:30:45 +02:00 |
Miriam Baglioni
|
6a15067a64
|
added one step in the workflow
|
2020-05-28 17:30:09 +02:00 |
Miriam Baglioni
|
b737ed8236
|
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
|
2020-05-28 17:29:21 +02:00 |
Miriam Baglioni
|
1060977272
|
added fs actions to remove and the create the workingDir
|
2020-05-28 10:04:36 +02:00 |
Miriam Baglioni
|
1855453434
|
changed the outputdir of the last step
|
2020-05-27 17:59:36 +02:00 |
Miriam Baglioni
|
4589c428b1
|
generate action sets and saves them in the hdfs path for the actions sets
|
2020-05-21 16:30:39 +02:00 |
Miriam Baglioni
|
eb0e47ba53
|
parameters for h2020 programme
|
2020-05-20 10:26:44 +02:00 |
Miriam Baglioni
|
08218d2f3f
|
new workflow with added steps
|
2020-05-19 18:44:25 +02:00 |
Miriam Baglioni
|
9447d78ef3
|
added preparation classes
|
2020-05-19 18:42:50 +02:00 |
Miriam Baglioni
|
f0f14caf99
|
removed script files for shell actions not performed
|
2020-05-18 13:06:16 +02:00 |
Miriam Baglioni
|
23bbac7d7c
|
-
|
2020-05-18 13:05:03 +02:00 |
Miriam Baglioni
|
abc45f2708
|
added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme
|
2020-05-18 13:04:06 +02:00 |
Miriam Baglioni
|
5a648016ef
|
parameters from the GetFile class
|
2020-05-15 18:18:50 +02:00 |
Miriam Baglioni
|
83c262a483
|
workflow to download the files
|
2020-05-15 18:18:31 +02:00 |
Miriam Baglioni
|
22cb9e0da7
|
simple code to get file from URL
|
2020-05-15 18:18:01 +02:00 |
Claudio Atzori
|
0825321d0b
|
improved unit tests in dhp-aggregation
|
2020-05-05 12:39:04 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Michele Artini
|
fd57722c69
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 15:56:49 +01:00 |
Michele Artini
|
0fda2c3a30
|
some tests on db records
|
2020-03-25 09:43:58 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |