Sandro La Bruzzo
2506d7a679
Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:07:24 +01:00
Sandro La Bruzzo
cded363b55
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:06:35 +01:00
Miriam Baglioni
4094f2bb9a
added integration md file
2021-11-17 10:04:52 +01:00
Miriam Baglioni
ec8b0219ff
[Documentation] Added first page for Integration via unresolved entities generation
2021-11-16 17:41:34 +01:00
Sandro La Bruzzo
2d67020c59
added dhp-enrichment maven site template
2021-11-16 16:01:08 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment #157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 #140
2021-09-15 16:40:25 +02:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
ed183d878e
GetCSV refactoring - modified test classes due to change in the model of projects and programme
2021-08-13 09:28:51 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
d57b2bb927
GetCSV refactoring - removing not needed dependency
2021-08-12 18:12:51 +02:00
Miriam Baglioni
9da74b544a
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:12:15 +02:00
Miriam Baglioni
ab8abd61bb
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:11:07 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Miriam Baglioni
804589eb30
reverting
2021-08-11 17:23:35 +02:00
Miriam Baglioni
d688749ad9
reverting
2021-08-11 17:22:28 +02:00
Miriam Baglioni
524c06e028
reverting
2021-08-11 17:20:30 +02:00
Miriam Baglioni
7aa3260729
reverting
2021-08-11 17:18:45 +02:00
Miriam Baglioni
55fc500d8d
reverting
2021-08-11 17:17:48 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
61d811ba53
suggestions from intellij
2021-08-11 12:18:20 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Sandro La Bruzzo
b1b0cc3f15
fixed wrong package name
2021-07-29 13:55:08 +02:00
Sandro La Bruzzo
3721df7aa6
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
2021-07-29 10:45:35 +02:00
Sandro La Bruzzo
3d8f0f629b
implemented workflow of creation action set for scholexplorer
2021-07-28 16:15:34 +02:00
Miriam Baglioni
cc0d3d8a7b
mergin with branch beta
2021-07-28 11:24:46 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Sandro La Bruzzo
16c91203bd
implemented workflow of creation action set for scholexplorer
2021-07-28 10:30:49 +02:00
Sandro La Bruzzo
825d9f0289
fixed datacite workflow starting from Importing delta
2021-07-27 16:09:46 +02:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Claudio Atzori
a0393607a7
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:15:08 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00
Claudio Atzori
70ded407bb
HttpClient used in metadata collection retries also on 404
2021-07-05 18:04:30 +02:00
Sandro La Bruzzo
db933ebd21
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-29 14:16:12 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Claudio Atzori
af42377d0e
HttpClient used in metadata collection retries on 502, 503, 504
2021-06-28 09:34:30 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
5edcc6832a
applying sonarLint suggestions
2021-06-23 09:53:29 +02:00
Sandro La Bruzzo
1dc0c59e20
merged fix thai dates from stable_ids
2021-06-21 10:39:46 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
3990165d05
changed typologies of unresolved relation
2021-06-18 11:43:59 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Claudio Atzori
e9e86a237d
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-11 17:00:02 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
dd997c49e0
fix wrong relation id
...
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Sandro La Bruzzo
0cdb7ccdaa
added inverse relations to datacite mapping
2021-06-04 15:10:20 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
d512062b58
integrating pull #109 , H2020Classification
2021-05-27 12:22:47 +02:00
Sandro La Bruzzo
bced804151
updated wf Datacite Import to retrieve the block size as parameter
2021-05-26 17:06:50 +02:00
Miriam Baglioni
abd88f663d
changed test resource to mirror change in the input file
2021-05-21 15:20:47 +02:00
Miriam Baglioni
c844877de2
changed workflow flow to possibly parallelize also the programme and project preparation steps
2021-05-21 14:41:57 +02:00
Miriam Baglioni
073d76864d
refactoring
2021-05-21 14:41:03 +02:00
Miriam Baglioni
4c8b4a774c
removed not needed code
2021-05-21 14:40:07 +02:00
Miriam Baglioni
53b9d87fec
new prepareProgramme according to the new file
2021-05-21 11:49:31 +02:00
Miriam Baglioni
1ee8f13580
refactoring and added "left" as join type to be 100% sure to get the whole set of projects
2021-05-21 11:49:05 +02:00
Miriam Baglioni
e07c3ba089
due to change in the input file the filtering step is no more needed
2021-05-21 11:47:43 +02:00
Miriam Baglioni
54f6e2f693
changed to get the needed information to build the action set as parallel jobs
2021-05-21 11:47:00 +02:00
Miriam Baglioni
7180505519
removed non needed variable
2021-05-21 11:46:13 +02:00
Miriam Baglioni
2eb1a8b344
changed because the input file changed
2021-05-21 11:40:20 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Miriam Baglioni
9610224671
added param to workflow property
2021-05-20 18:21:12 +02:00
Miriam Baglioni
052c837843
-
2021-05-20 15:54:44 +02:00
Claudio Atzori
b695932ae4
integrated pull#108
2021-05-20 15:34:04 +02:00
Miriam Baglioni
dc0ad8d2e0
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
2021-05-20 14:53:53 +02:00
Claudio Atzori
239d0f0a9a
ROR actionset import workflow backported from branch stable_ids
2021-05-18 16:12:11 +02:00
Michele Artini
c1e20de7cf
fixed the deserialization of a json property
2021-05-18 14:00:14 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Sandro La Bruzzo
6424cd9062
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo
073dcea2aa
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori
3797543600
MDStoreManager model classes moved in dhp-schemas
2021-05-10 14:32:05 +02:00
Michele Artini
d82071ba6c
originalId with prefix
2021-05-06 15:34:48 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Claudio Atzori
ba86835951
using common constants from ModelConstants
2021-05-04 11:51:52 +02:00
Michele Artini
a278d67175
parse input file
2021-04-29 11:34:47 +02:00
Michele Artini
f77ba34126
pid types
2021-04-29 09:50:05 +02:00
Michele Artini
7c5cd86927
annotations and tests
2021-04-29 09:29:19 +02:00
Michele Artini
b5cf505cc6
partial implementation of the ROR->actionset workflow
2021-04-28 16:00:24 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Sandro La Bruzzo
63c0303137
removed unused import, add log
2021-04-27 12:17:23 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
fa42026590
fixed PersonCleaner extension functions
2021-04-27 10:10:06 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
7ed107be53
depending on external dhp-schemas module
2021-04-23 17:52:36 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
879e8cc7ef
WIP: using common definitions from ModelConstants
2021-03-31 17:12:01 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Claudio Atzori
8db248aa13
avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)!
2021-03-23 09:56:34 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Claudio Atzori
61a2551e74
migrated last changes from svn (dnet45)
2021-03-15 17:17:55 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Claudio Atzori
545f8f3e48
using jackson objectmapper instead of GSon to serialise the aggregation report
2021-02-17 12:15:00 +01:00
Claudio Atzori
b592d78bb4
WIP: collectorWorker error reporting, generalised reported implementation
2021-02-17 10:28:01 +01:00
Claudio Atzori
cf27905a71
WIP: collectorWorker error reporting, added report messages
2021-02-16 16:53:14 +01:00
Claudio Atzori
1abe6d1ad7
WIP: collectorWorker error reporting, added report messages
2021-02-15 15:08:59 +01:00
Claudio Atzori
523a6bfa97
Merge pull request 'first commit to the correct branch' ( #94 ) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
...
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Sandro La Bruzzo
7edcc87ed4
changed xslt behaviour on failure
2021-02-12 17:27:08 +01:00
Sandro La Bruzzo
6a37c7f175
merge fixed
2021-02-12 16:38:47 +01:00
Sandro La Bruzzo
b3f5c2351d
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
...
Conflicts:
dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo
f216277219
Implemented cleaning date
2021-02-12 16:34:52 +01:00
Andreas Czerniak
5a9017cf18
clone, min. changes, test, run
2021-02-12 14:32:36 +01:00
Claudio Atzori
aa55dedb8a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-12 12:31:05 +01:00
Claudio Atzori
29c6f7e255
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
2021-02-12 12:31:02 +01:00
Sandro La Bruzzo
17e6f1934e
fixed NPE on cleaner
2021-02-12 11:48:11 +01:00
Sandro La Bruzzo
ebcc3ec14f
updated wrong datacite identifier in trasformation
2021-02-11 16:25:51 +01:00
Claudio Atzori
bae029f828
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
2021-02-08 18:07:23 +01:00
Claudio Atzori
bebc54d5bf
seq file storing native records is now compressed
2021-02-08 18:06:25 +01:00
Claudio Atzori
50add4c61b
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
2021-02-08 12:19:38 +01:00
Claudio Atzori
40df0f987d
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
2021-02-06 20:12:00 +01:00
Claudio Atzori
a8a758925e
better logging, WIP: collectorWorker error reporting
2021-02-05 19:18:05 +01:00
Claudio Atzori
730973679a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-04 17:25:00 +01:00
Claudio Atzori
deb85706db
imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2
2021-02-04 17:24:52 +01:00
Sandro La Bruzzo
4dae5e605d
implemented messaging btween collection worker and Dnet
2021-02-04 15:51:15 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Claudio Atzori
40764cf626
better logging, WIP: collectorWorker error reporting
2021-02-04 14:06:02 +01:00
Sandro La Bruzzo
69c253710b
fixed test
2021-02-04 10:30:49 +01:00
Claudio Atzori
e04045089f
better logging, WIP: collectorWorker error reporting
2021-02-03 17:58:22 +01:00
Claudio Atzori
0e8a4f9f1a
better logging, WIP: collectorWorker error reporting
2021-02-03 12:33:41 +01:00
Claudio Atzori
53884d12c2
code formatting
2021-02-02 14:38:03 +01:00
Claudio Atzori
ac46c247d2
code formatting
2021-02-02 14:24:00 +01:00
Claudio Atzori
bde14b149a
fixed transformation target paths
2021-02-02 12:49:29 +01:00
Claudio Atzori
ca4391aa1c
minor changes
2021-02-02 12:44:04 +01:00
Claudio Atzori
bb89b99b24
code formatting
2021-02-02 12:34:14 +01:00
Claudio Atzori
75807ea5ae
factored out constants
2021-02-02 12:28:21 +01:00
Sandro La Bruzzo
0634674add
implemented transformation test
2021-02-02 12:12:14 +01:00
Claudio Atzori
8eaa1fd4b4
WIP: metadata collection in INCREMENTAL mode and relative test
2021-02-01 19:29:10 +01:00
Sandro La Bruzzo
bead34d11a
code refactor
2021-02-01 14:58:06 +01:00
Sandro La Bruzzo
6ff234d81b
Implemented a first prototype of incremental harvesting and trasformation using readlock
2021-02-01 13:56:05 +01:00
Sandro La Bruzzo
b6b835ef49
update transformation Factory to get Transformation Rule by Id and not by Title
2021-02-01 08:49:42 +01:00
Sandro La Bruzzo
e423634cb6
RollBack in case of error WORKS!!!
2021-01-29 17:21:42 +01:00
Sandro La Bruzzo
8ee82576c6
Collection on Refresh WORKS!!!
2021-01-29 17:02:46 +01:00
Sandro La Bruzzo
0276180039
WIP mdstore
...
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo
0f8e2ecce6
Merged Datacite transfrom into this branch
2021-01-29 10:45:07 +01:00
Sandro La Bruzzo
99cf3a8ea4
Merged Datacite transfrom into this branch
2021-01-28 16:34:46 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo
184e7b3856
Implemented new Transformation using spark
2021-01-27 15:43:08 +01:00
Sandro La Bruzzo
ffb092b8d3
removed duplicate code HttpConnector.java
2021-01-25 15:05:37 +01:00
Sandro La Bruzzo
cda210a2ca
changed documentation since it didn't reflect the current status
2021-01-25 14:17:42 +01:00
Claudio Atzori
41500669e2
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 14:39:47 +01:00
Claudio Atzori
2a7a10809e
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 10:05:02 +01:00
Claudio Atzori
d6686dd7cf
merged from master
2021-01-08 18:16:12 +01:00
Claudio Atzori
34229970e6
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-08 16:29:17 +01:00
Claudio Atzori
1361c9eb0c
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-07 10:07:30 +01:00
Claudio Atzori
2e503ee101
code formatting
2020-12-17 13:47:38 +01:00
Claudio Atzori
03319d3bd9
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master"
...
This reverts commit add7e1693b
, reversing
changes made to f9a8fd8bbd
.
2020-12-17 12:23:58 +01:00
Miriam Baglioni
888175baf7
added java doc
2020-12-01 18:36:29 +01:00
Miriam Baglioni
3d62d99d5d
fixed issue in workflow variable
2020-12-01 15:02:49 +01:00
Miriam Baglioni
17680296b9
removed unnecessary variable and unused method
2020-12-01 15:02:31 +01:00
Miriam Baglioni
5b3ed70808
refactoring
2020-12-01 14:31:34 +01:00
Miriam Baglioni
62ff4999e3
added workflow and last step of collection and save
2020-12-01 14:30:56 +01:00
Miriam Baglioni
45d06c45c7
collecting all the atoic actions for result type and save them all in the AS path
2020-12-01 14:29:18 +01:00
Miriam Baglioni
0051ebede5
extending test
2020-12-01 12:43:03 +01:00
Miriam Baglioni
719da15f04
added test resources
2020-12-01 12:42:30 +01:00
Miriam Baglioni
db36e11912
classes test classes and resources for production of the actionset to include bipFinder score in results
2020-11-30 20:14:23 +01:00
Sandro La Bruzzo
66efb39634
implemented merge scholix
2020-11-04 09:04:01 +01:00
Miriam Baglioni
4905739be6
changed resource file to mirror change in business logic
2020-10-30 17:02:57 +01:00
Miriam Baglioni
b40360ebfb
changed the code to mirror the changed decision in the classification level and prodramme description labels
2020-10-30 17:02:30 +01:00
Miriam Baglioni
696409fb9f
disabled tests because needing remote resource
2020-10-30 17:01:48 +01:00
Miriam Baglioni
a2ce527fae
changed to match the requirements for short titles in level and long titles in classification
2020-10-20 17:03:25 +02:00
Claudio Atzori
5f7b75f5c5
code formatting
2020-10-07 13:22:54 +02:00
Miriam Baglioni
061527f06e
adding short description
2020-10-05 13:54:39 +02:00
Miriam Baglioni
0c12d7bdd8
adding short description
2020-10-05 11:39:55 +02:00
Miriam Baglioni
fc2f7636be
removed not used code
2020-10-02 12:33:52 +02:00
Miriam Baglioni
4aec347351
refactoring
2020-10-01 16:23:52 +02:00
Miriam Baglioni
61946b4092
refactoring
2020-10-01 16:22:48 +02:00
Miriam Baglioni
7e6d35e56c
added the link to the excel file related to topic
2020-10-01 15:53:31 +02:00
Miriam Baglioni
43cbd62c2b
added classpath.first in the configuration
2020-10-01 15:46:34 +02:00
Miriam Baglioni
cd69c6b023
added dependency for the topic file path
2020-10-01 15:45:59 +02:00
Miriam Baglioni
771cde3d05
moved the library version to global pom
2020-10-01 15:43:47 +02:00
Miriam Baglioni
632351c0da
modified test resources to mirror the changed in the code
2020-10-01 15:43:02 +02:00
Miriam Baglioni
ebc1c5513f
modified test resources to mirror the changed in the code
2020-10-01 15:42:29 +02:00
Miriam Baglioni
3a374c34b6
fixed null pointer exception
2020-10-01 15:41:01 +02:00
Miriam Baglioni
83ea746163
added check to the test
2020-10-01 15:40:28 +02:00
Miriam Baglioni
6e5db85b32
-
2020-10-01 11:51:11 +02:00
Miriam Baglioni
a46179f61c
refactoring
2020-10-01 11:22:01 +02:00
Miriam Baglioni
b90bee124b
removing raws that are empy from thos imported
2020-10-01 11:16:49 +02:00
Miriam Baglioni
c107f193c9
refactoring
2020-10-01 11:16:22 +02:00
Miriam Baglioni
706a80a29a
added test to check that separator '-' (not hyphen) will be recognized
2020-10-01 10:38:31 +02:00
Miriam Baglioni
3dca586b3b
refactoring
2020-10-01 10:34:48 +02:00
Miriam Baglioni
416bda6066
changed the programme.desxcription by using the same value used in the classification instead of the short title or the title
2020-10-01 10:31:33 +02:00
Miriam Baglioni
f6587c91f3
added comparison to a char that seems - but it is not
2020-10-01 10:30:26 +02:00
Miriam Baglioni
7e73bb88b3
changed the logic to add the topic description to the project
2020-09-28 17:21:43 +02:00
Miriam Baglioni
0a035e3630
-
2020-09-28 17:20:49 +02:00
Miriam Baglioni
16bee2084d
added the topic code to the project subset
2020-09-28 17:20:11 +02:00
Miriam Baglioni
0bf2d0db52
added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem
2020-09-28 12:17:22 +02:00
Miriam Baglioni
c2abde4d9f
changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file
2020-09-28 12:16:34 +02:00
Miriam Baglioni
d930b8d3fc
changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description)
2020-09-28 12:15:48 +02:00
Miriam Baglioni
f8f5cfd5cc
removed the part added to set the topic code and description in the step of project preparation
2020-09-28 12:13:33 +02:00
Miriam Baglioni
9e19c9a221
remove the topic description from the values in the CSVProject class
2020-09-28 12:11:03 +02:00
Miriam Baglioni
6d8b932e40
refactoring
2020-09-28 12:06:56 +02:00
Miriam Baglioni
b77f166549
changed the package name from csvutils to utils
2020-09-28 12:05:47 +02:00
Miriam Baglioni
e33e3277de
added needed dependency to read the excel file
2020-09-28 12:03:14 +02:00
Miriam Baglioni
f4739a371a
code to get the information related to the topic association between code and description.
2020-09-28 12:02:48 +02:00
Miriam Baglioni
12c2dfc268
modified the resource to consider the information added to the model
2020-09-25 14:17:23 +02:00
Miriam Baglioni
969fa8d96e
fixed issue and changed the transformation of the programme file to consider the new model
2020-09-25 13:32:34 +02:00
Miriam Baglioni
e917281822
-
2020-09-24 15:24:05 +02:00
Miriam Baglioni
9f54f69e6d
added topic information
2020-09-24 15:23:35 +02:00
Miriam Baglioni
d6206d6e63
add the topic description to the action set associated to the project
2020-09-24 15:22:40 +02:00
Miriam Baglioni
6b50226f3b
added topic code and topic description
2020-09-24 15:21:49 +02:00
Miriam Baglioni
15af1f527e
modified to consider the topic information
2020-09-24 15:20:56 +02:00
Miriam Baglioni
609ff17cfc
now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020
2020-09-24 15:19:31 +02:00
Miriam Baglioni
b66f930466
Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description
2020-09-24 15:16:56 +02:00
Miriam Baglioni
860e6d38a6
added topic description to the CSV project variables
2020-09-24 15:15:26 +02:00
Miriam Baglioni
1d84cf19a6
added new line to resource file
2020-09-23 17:32:22 +02:00
Miriam Baglioni
f0c476b6c9
modification to the test classes to consider h2020classification
2020-09-23 17:31:49 +02:00
Miriam Baglioni
2cba3cb484
modification to the classes building the actionset to consider the h2020classification
2020-09-23 17:31:15 +02:00
Miriam Baglioni
1069cf243a
modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification
2020-09-22 14:38:00 +02:00
Claudio Atzori
9cd27183b6
[maven-release-plugin] prepare for next development iteration
2020-06-22 11:27:44 +02:00
Claudio Atzori
1e3dab0631
[maven-release-plugin] prepare release dhp-1.2.3
2020-06-22 11:27:39 +02:00
Claudio Atzori
306669209f
code formatting
2020-06-16 16:54:44 +02:00
Claudio Atzori
603b1bd0bb
Merge branch 'master' into dhp_oaf_model
2020-06-16 15:43:59 +02:00
Claudio Atzori
c4d9f1837f
[maven-release-plugin] prepare for next development iteration
2020-06-12 12:21:08 +02:00
Claudio Atzori
f0746a7605
[maven-release-plugin] prepare release dhp-1.2.2
2020-06-12 12:21:03 +02:00
Claudio Atzori
a2fdf85ba1
WIP: graph cleaner implementation
2020-06-09 19:52:53 +02:00
Miriam Baglioni
dfa4997a4f
removed commented code
2020-05-29 10:45:18 +02:00
Miriam Baglioni
6f1eea28b6
changed message in log
2020-05-29 10:41:39 +02:00
Miriam Baglioni
8b6e886fb6
added new resource for testing
2020-05-28 23:54:31 +02:00
Miriam Baglioni
6989fb9c8a
changed the project test according to the newly introduced join with the db project codes
2020-05-28 23:53:24 +02:00
Miriam Baglioni
782984d8e5
added needed parameter
2020-05-28 23:52:41 +02:00
Miriam Baglioni
01f7876595
fix issue with flatMap - the return type must not be null
2020-05-28 23:50:32 +02:00
Miriam Baglioni
773735f870
added the path to the file containing the projects code from the db
2020-05-28 17:30:45 +02:00
Miriam Baglioni
6a15067a64
added one step in the workflow
2020-05-28 17:30:09 +02:00
Miriam Baglioni
5309a99a70
modified the PrepareProjects to consider those in the db
2020-05-28 17:29:53 +02:00
Miriam Baglioni
b737ed8236
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
2020-05-28 17:29:21 +02:00
Miriam Baglioni
35b7279147
changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease
2020-05-28 10:26:12 +02:00
Miriam Baglioni
df44db686a
refactoring
2020-05-28 10:07:00 +02:00
Miriam Baglioni
87b07f4af8
removed unused variables
2020-05-28 10:05:43 +02:00
Miriam Baglioni
1060977272
added fs actions to remove and the create the workingDir
2020-05-28 10:04:36 +02:00
Miriam Baglioni
96d1a3c431
deleted the file were to store the csv files
2020-05-28 10:04:10 +02:00
Miriam Baglioni
669c05c771
added groupBy before creating Actions
2020-05-28 10:00:45 +02:00
Miriam Baglioni
1855453434
changed the outputdir of the last step
2020-05-27 17:59:36 +02:00
Miriam Baglioni
92e3a52e91
merge branch with fork master
2020-05-26 15:57:51 +02:00
Claudio Atzori
7582532e73
[maven-release-plugin] prepare for next development iteration
2020-05-25 19:48:18 +02:00
Claudio Atzori
01c2e93395
[maven-release-plugin] prepare release dhp-1.2.1
2020-05-25 19:48:14 +02:00
Miriam Baglioni
ac8025f469
-
2020-05-22 15:29:41 +02:00
Miriam Baglioni
50ad83b97f
-
2020-05-22 15:27:19 +02:00
Miriam Baglioni
473c6d3a23
produces AtomicActions instead of Projects
2020-05-22 15:26:57 +02:00
Miriam Baglioni
4589c428b1
generate action sets and saves them in the hdfs path for the actions sets
2020-05-21 16:30:39 +02:00
Miriam Baglioni
055eec5a77
added resource for prepare project test
2020-05-20 13:54:10 +02:00
Miriam Baglioni
9079bc1f61
-
2020-05-20 13:53:32 +02:00
Miriam Baglioni
67ba4fde57
added test for prepare projects step
2020-05-20 13:53:08 +02:00
Miriam Baglioni
3c0eb12d3e
removed the not zipped files
2020-05-20 10:31:05 +02:00
Miriam Baglioni
c0d9e02340
zipped test resources that are too big
2020-05-20 10:30:25 +02:00
Miriam Baglioni
5e9c9fa87c
tests
2020-05-20 10:29:57 +02:00
Miriam Baglioni
faed7521bf
added resources for testing
2020-05-20 10:29:29 +02:00
Miriam Baglioni
75491482de
added a new preparation step to replicate each project for the programme it is associated to
2020-05-20 10:28:56 +02:00
Miriam Baglioni
eb0e47ba53
parameters for h2020 programme
2020-05-20 10:26:44 +02:00
Miriam Baglioni
08218d2f3f
new workflow with added steps
2020-05-19 18:44:25 +02:00
Miriam Baglioni
457293ccc0
test for the variuos steps of project update with programme
2020-05-19 18:43:42 +02:00
Miriam Baglioni
9447d78ef3
added preparation classes
2020-05-19 18:42:50 +02:00
Miriam Baglioni
f0f14caf99
removed script files for shell actions not performed
2020-05-18 13:06:16 +02:00
Miriam Baglioni
23bbac7d7c
-
2020-05-18 13:05:03 +02:00
Miriam Baglioni
4f1ff7ba73
added dependency to org.apache.commons common-csv
2020-05-18 13:04:39 +02:00
Miriam Baglioni
abc45f2708
added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme
2020-05-18 13:04:06 +02:00
Miriam Baglioni
5a648016ef
parameters from the GetFile class
2020-05-15 18:18:50 +02:00
Miriam Baglioni
83c262a483
workflow to download the files
2020-05-15 18:18:31 +02:00
Miriam Baglioni
22cb9e0da7
simple code to get file from URL
2020-05-15 18:18:01 +02:00
Claudio Atzori
60c40618d3
[maven-release-plugin] prepare for next development iteration
2020-05-11 10:17:14 +02:00
Claudio Atzori
c267d958d5
[maven-release-plugin] prepare release dhp-1.2.0
2020-05-11 10:17:10 +02:00
Claudio Atzori
42f1a2bf94
bumped project version to 1.2.0-SNAPSHOT
2020-05-11 10:05:57 +02:00
Claudio Atzori
0ccc864ad9
[maven-release-plugin] prepare for next development iteration
2020-05-08 17:01:31 +02:00
Claudio Atzori
6e47c724c6
[maven-release-plugin] prepare release dhp-1.1.7
2020-05-08 17:01:27 +02:00
Claudio Atzori
0825321d0b
improved unit tests in dhp-aggregation
2020-05-05 12:39:04 +02:00
Claudio Atzori
439c6255a2
cleanup
2020-04-29 19:09:07 +02:00
Claudio Atzori
6f5b899038
reformatted code according to the updated style descriptor
2020-04-28 11:23:29 +02:00
Claudio Atzori
a0bdbacdae
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:52:31 +02:00
Claudio Atzori
7a3f8085f7
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:45:40 +02:00
Claudio Atzori
9147af7fed
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
2020-04-20 15:24:33 +02:00
Claudio Atzori
d714bfb4d4
collectedfrom field moved in common parent class Oaf.java
2020-04-20 12:25:19 +02:00
Claudio Atzori
ad7a131b18
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin , applied to each java class in the project
2020-04-18 12:42:58 +02:00
Claudio Atzori
6b5f9ca9cb
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
2020-04-10 17:53:07 +02:00
Claudio Atzori
7061d07727
ActionSets migration serialize the output as plain text files instead of SequenceFiles
2020-04-01 14:58:22 +02:00
Claudio Atzori
377e1ba840
[maven-release-plugin] prepare for next development iteration
2020-03-30 20:06:00 +02:00
Claudio Atzori
76d9315129
[maven-release-plugin] prepare release dhp-1.1.6
2020-03-30 20:05:56 +02:00
Michele Artini
ae03948eed
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-03-27 11:47:07 +01:00
Michele Artini
f6e86b44a6
tests
2020-03-27 11:46:37 +01:00
Michele Artini
408be3c632
test and fixed a problem with datacite namespaces
2020-03-27 11:44:50 +01:00
Sandro La Bruzzo
0cd022ad6a
merge with master
2020-03-26 14:08:29 +01:00
Claudio Atzori
c0e825e713
dhp-aggregation workflow tests upgraded to junit5
2020-03-25 17:59:45 +01:00
Michele Artini
ebe45003d9
fixed some junit packages
2020-03-25 16:45:03 +01:00
Michele Artini
d9bfdcd607
updated poms
2020-03-25 16:31:12 +01:00
Michele Artini
fd57722c69
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-03-25 15:56:49 +01:00
Michele Artini
2559299da4
tests
2020-03-25 12:25:00 +01:00
Michele Artini
0fda2c3a30
some tests on db records
2020-03-25 09:43:58 +01:00
Michele Artini
e3760c7f39
fix a bug with organization countries
2020-03-24 08:43:56 +01:00
Claudio Atzori
ecb64e4998
Merge branch 'migration_wfs_regular_all_steps'
2020-03-23 08:57:01 +01:00
Michele Artini
15160032bd
fixed a bug setting some organization fields
2020-03-23 08:39:14 +01:00
Claudio Atzori
36236dd1c1
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
2020-03-19 14:00:38 +01:00
Claudio Atzori
abe8fb69a2
added global properties, moved postprocessing script inside the oozie_app directory
2020-03-18 15:43:54 +01:00
Claudio Atzori
c7e0730720
compress the output produced by migration steps 1 and 2
2020-03-18 09:34:57 +01:00
Claudio Atzori
2f11e37602
fixed expansion of path variables
2020-03-17 19:41:07 +01:00
Claudio Atzori
2795b0b096
no need to mkdir a the all_entities file
2020-03-17 17:22:14 +01:00
Claudio Atzori
19746ad308
when reuseContent, reset ${workingPath}/all_entities
2020-03-17 17:17:06 +01:00
Claudio Atzori
2f0c85eeb3
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
2020-03-17 17:04:58 +01:00
Claudio Atzori
b8290b5851
updated parameters for regular_all_steps worfklow
2020-03-17 15:45:30 +01:00
Claudio Atzori
4706f24ec5
updated parameters for regular_all_steps worfklow
2020-03-17 15:23:54 +01:00
Claudio Atzori
af835f2f98
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
2020-03-15 18:07:59 +01:00
Claudio Atzori
9c84e21b87
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
2020-03-13 15:56:52 +01:00
Michele Artini
b6efa9d6ab
Configuration of the SequenceFile Writer
2020-03-05 15:49:14 +01:00
Michele Artini
755eade2fb
fix creation ids
2020-03-04 14:49:45 +01:00
Michele Artini
e7167b996a
logs and closeable
2020-03-04 10:46:36 +01:00
Michele Artini
4b29a121b0
migration using spark in step2
2020-03-02 16:12:14 +01:00
Michele Artini
5445a57102
migration using spark in step2
2020-03-02 16:11:59 +01:00
Michele Artini
93665773ea
Fixed a problem with JavaRDD Union
2020-02-25 15:59:21 +01:00
Michele Artini
5d3739b5cf
migration of claims
2020-02-19 15:11:17 +01:00
Michele Artini
173f1df1e5
saved a query for openaire production database
2020-02-19 10:15:08 +01:00
Sandro La Bruzzo
9a2d74ac82
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-02-19 10:13:45 +01:00
Sandro La Bruzzo
e5d7cdf422
fixed sql query
2020-02-19 10:13:36 +01:00
Sandro La Bruzzo
2b8675462f
refactoring code
2020-02-19 10:07:08 +01:00
Claudio Atzori
6a288625e5
fixed workflow outgoing node
2020-02-17 15:04:33 +01:00
Sandro La Bruzzo
76ee85141a
added oozie job for DNET migration and implemented Spark job for extracting entities
2020-02-17 12:31:44 +01:00
Michele Artini
176c5606bd
aligned with origin/master, aligned model and mapping
2020-02-17 10:40:53 +01:00
Claudio Atzori
a3d0b57b25
[maven-release-plugin] prepare for next development iteration
2020-02-13 18:11:33 +01:00
Claudio Atzori
6ed9a15bc8
[maven-release-plugin] prepare release dhp-1.1.5
2020-02-13 18:11:31 +01:00
Claudio Atzori
49e648f7c3
bumped version
2020-02-13 18:09:31 +01:00
Michele Artini
80cb52593f
bug fixing
2020-02-13 15:34:13 +01:00
Michele Artini
cdea0dae75
bug fixing
2020-02-12 16:34:00 +01:00
Michele Artini
69336195d3
simplifications
2020-02-12 11:12:38 +01:00
Michele Artini
06c2fd6df9
bug fixing
2020-02-11 15:29:50 +01:00
Michele Artini
5fc09b179c
bug fixing
2020-02-11 12:48:03 +01:00
Michele Artini
95740767e0
Ready for tests
2020-02-10 16:04:06 +01:00
Michele Artini
181e8498d4
...
2020-02-07 16:02:49 +01:00
Michele Artini
bb1533a07e
partial commit
2020-02-05 15:35:40 +01:00
Michele Artini
fbb0fc140b
partial implementation of migration
2020-02-04 15:25:47 +01:00
Michele Artini
6bfe2dc96e
partial implementation
2020-01-22 16:00:23 +01:00
Michele Artini
f6eccdde33
partial implementation
2020-01-21 14:17:05 +01:00
Michele Artini
cd114f1c3b
partial update
2020-01-21 12:32:10 +01:00
Michele Artini
b35c59eb42
partial implementation of entities from db
2020-01-20 16:04:19 +01:00
Michele Artini
81f82b5d34
partial implementation of applications to migrate entities
2020-01-17 15:26:21 +01:00
Michele Artini
f7b9a7a9af
entity migration (partial implementation)
2020-01-10 15:55:23 +01:00
Michele Artini
7229fecbcf
fix warnings in poms
2019-12-20 13:41:08 +01:00
Sandro La Bruzzo
abd9034da0
implemented DedupRecord factory with the merge of publications
2019-12-11 15:43:24 +01:00
miconis
4b66b471a4
implementation of the sorting by trust mechanism and the merge of oaf entities
2019-12-10 14:57:16 +01:00
Sandro La Bruzzo
cc63706347
Implemented deduplication on spark
2019-12-06 13:38:00 +01:00
Claudio Atzori
5711e75f67
use ${project.version} whenever possible
2019-11-08 17:41:51 +01:00
Claudio Atzori
7fe6835b47
[maven-release-plugin] prepare for next development iteration
2019-11-07 17:39:30 +01:00
Claudio Atzori
58918967d9
[maven-release-plugin] prepare release dhp-1.0.4
2019-11-07 17:39:27 +01:00
Claudio Atzori
1e7a2ac41d
align parmeter names, graph import procedure WIP
2019-11-04 17:41:01 +01:00
Claudio Atzori
f39148dab8
[maven-release-plugin] prepare for next development iteration
2019-11-04 12:34:48 +01:00
Claudio Atzori
34b0e7b40a
[maven-release-plugin] prepare release dhp-1.0.3
2019-11-04 12:34:46 +01:00
Sandro La Bruzzo
fd0ad82111
[maven-release-plugin] prepare for next development iteration
2019-10-31 12:08:51 +01:00
Sandro La Bruzzo
f224613b40
[maven-release-plugin] prepare release dhp-1.0.2
2019-10-31 12:08:49 +01:00
Sandro La Bruzzo
e13c30cc96
[maven-release-plugin] rollback the release of dhp-1.0.2
2019-10-31 12:07:04 +01:00
Sandro La Bruzzo
4da5239203
[maven-release-plugin] prepare release dhp-1.0.2
2019-10-31 12:06:14 +01:00
Sandro La Bruzzo
db8b346edd
[maven-release-plugin] rollback the release of 1.0.1
2019-10-31 11:49:05 +01:00
Sandro La Bruzzo
fc80052173
[maven-release-plugin] prepare for next development iteration
2019-10-31 11:47:42 +01:00
Sandro La Bruzzo
3150c7ce6d
[maven-release-plugin] prepare release 1.0.1
2019-10-31 11:47:40 +01:00
Claudio Atzori
c8bb81cd9a
align dependencies with IIS cluster
2019-10-29 18:10:20 +01:00
Sandro La Bruzzo
5744a64478
added module dhp=graph-mapper
2019-10-24 16:00:28 +02:00
Sandro La Bruzzo
5a8a323f2a
dhp-collection-worker integrated in dhp-workflows
2019-10-24 11:36:59 +02:00
Claudio Atzori
dd1d6fcb01
moved libs in main pom file
2019-10-18 10:50:55 +02:00
Claudio Atzori
c7654b6fe3
renamed collection & transformation oozie workflow files
2019-10-18 09:42:20 +02:00
Claudio Atzori
27db5afdad
integrating the oozie workflow build/deploy/run mechanism, took inspiration from iis
2019-10-17 18:38:30 +02:00
Sandro La Bruzzo
bbb87d0e3d
implemented saxonHE on transformation spark job
2019-10-10 11:33:51 +02:00
Sandro La Bruzzo
4b8c7c279d
Added documentation on a class, and reused ArgumetApplicationParser on dhp-aggregation
2019-10-07 17:02:53 +02:00
Sandro La Bruzzo
53ec9bccca
changed the implemetation of RabitMQ Comunication
2019-04-16 12:28:01 +02:00
Sandro La Bruzzo
403c13eebf
Implemented message manager, Fixed bug on collection worker, implemented Collecion and Transform spark job
2019-04-11 15:39:29 +02:00
Sandro La Bruzzo
ded6aef5e1
moved collector worker
2019-04-03 16:05:16 +02:00
Sandro La Bruzzo
c2ecbf5572
moved collector worker
2019-04-03 16:03:36 +02:00
Sandro La Bruzzo
12c65eab4c
implemented command line
2019-03-25 15:18:31 +01:00
Sandro La Bruzzo
6156562893
Added test
2019-03-18 10:47:28 +01:00
Sandro La Bruzzo
e67d9ee1a9
added first implementation of dnet-workflows
2019-03-18 10:44:35 +01:00