Claudio Atzori
9acc32faa6
[stats wf] final touches for the integration of PRs #166 , #179 in the master branch
2022-01-12 12:04:31 +01:00
dimitrispie
b053b0178e
Sprint 5 and other changes
2022-01-12 11:23:37 +01:00
Antonis Lempesis
b6b4bc0df9
added first indicator of sprint 5
2022-01-12 11:20:28 +01:00
Antonis Lempesis
e91f06f39b
fixed typos in indicators. Added extra views in monitor
2022-01-12 11:18:28 +01:00
Antonis Lempesis
3ce1976627
fixed column names
2022-01-12 11:14:41 +01:00
Antonis Lempesis
4878d7485c
added usage stats
2022-01-12 11:13:25 +01:00
Antonis Lempesis
a4316bafed
fixed a typo
2022-01-12 11:12:53 +01:00
Antonis Lempesis
bb17e070d8
added result_result relations
2022-01-12 11:09:38 +01:00
Claudio Atzori
a30a98a716
Applying PR#166 in the master branch (Added sprint 3&4 of indicators). Merge commit '0df9574a6f5d9d75bc840decb023561ae941f9d6'
2022-01-12 10:57:19 +01:00
Sandro La Bruzzo
57e2c4b749
formatted code
2022-01-12 09:40:28 +01:00
Claudio Atzori
0f2144b5e0
scalafmt: code formatting
2022-01-11 17:03:44 +01:00
Claudio Atzori
dcd282977c
pulled from beta
2022-01-11 16:59:41 +01:00
Claudio Atzori
4f212652ca
scalafmt: code formatting
2022-01-11 16:57:48 +01:00
Sandro La Bruzzo
0163dadb7f
[doiboost]
...
- update MAG schema, new filed added on version dec-2021
2022-01-11 11:05:44 +01:00
Miriam Baglioni
904e1c2667
Merge pull request 'Affiliation Propagation through semantic relation' ( #183 ) from enrichment into beta
...
Reviewed-on: D-Net/dnet-hadoop#183
2022-01-07 19:18:16 +01:00
Miriam Baglioni
064f9bbd87
[AFFPropSR] added new paprameter for the number of iterations and new code for just one iteration
2022-01-07 18:58:51 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
639190370a
mergin with branch beta
2022-01-07 11:29:25 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Claudio Atzori
8ae46ca789
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:52:15 +01:00
Claudio Atzori
908294d86e
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:49:05 +01:00
Claudio Atzori
3bd3653be9
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:39 +01:00
Claudio Atzori
3dc48c7ab5
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:27 +01:00
Claudio Atzori
f82db765db
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:15 +01:00
Claudio Atzori
8d13effa31
test for the tolerant deserialisation utility method
2022-01-04 16:38:26 +01:00
Claudio Atzori
9458ee7938
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-04 16:38:09 +01:00
Claudio Atzori
58f8998e3d
OAF-store-graph mdstores: save them in text format
2022-01-04 15:02:09 +01:00
Claudio Atzori
174c3037e1
OAF-store-graph mdstores: save them in text format
2022-01-04 14:40:16 +01:00
Claudio Atzori
045d767013
OAF-store-graph mdstores: save them in text format
2022-01-04 14:23:01 +01:00
Claudio Atzori
bd59b58efb
test for the tolerant deserialisation utility method
2022-01-04 11:26:56 +01:00
Claudio Atzori
a6977197b3
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-03 17:25:26 +01:00
Miriam Baglioni
4c60ee1718
mergin with branch beta
2022-01-03 15:24:02 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Claudio Atzori
fe7e5f4748
Merge pull request '[stats wf] result_result relations, usage stats, monitor views, indicator for sprint 5' ( #179 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#179
2022-01-03 14:52:11 +01:00
Claudio Atzori
bcea4e3a9b
added dnet workflow profile for the orchestration of the simplified and complete graph construction and processing pipeline, where the IIS works on the non-deduplicated graph
2022-01-03 14:33:00 +01:00
Miriam Baglioni
a706ba0c08
Merge pull request 'SDG Integration' ( #178 ) from SDG into beta
...
Reviewed-on: D-Net/dnet-hadoop#178
2021-12-23 14:50:00 +01:00
Antonis Lempesis
81ee654271
added result_result relations
2021-12-23 15:46:17 +02:00
Antonis Lempesis
7551e52e95
fixed a typo
2021-12-23 15:33:53 +02:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Claudio Atzori
cccb16900c
https://support.openaire.eu/issues/7330 normalising DOI urls
2021-12-23 12:33:53 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
69e9ea9eeb
[Graph Dump] Test for extraction of rels from entities extended
2021-12-23 10:15:30 +01:00
Miriam Baglioni
31b26d48ac
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
2021-12-23 10:09:47 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
34ac56565d
refactoring
2021-12-22 16:28:11 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
813f856d3f
[BipFinder] removing left over parameter in wf
2021-12-22 16:11:12 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
0807fdb65a
[BipFinder] remove not needed resources
2021-12-22 15:37:00 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
da5f6260aa
mergin with branch beta
2021-12-22 13:12:02 +01:00
Miriam Baglioni
be0acccf42
Merge branch 'beta' into dump
2021-12-22 12:39:57 +01:00
Antonis Lempesis
16539d7360
added usage stats
2021-12-22 02:54:42 +02:00
Antonis Lempesis
3edd661608
fixed column names
2021-12-21 22:55:04 +02:00
Antonis Lempesis
a4c0cbb98c
fixed typos in indicators. Added extra views in monitor
2021-12-21 15:54:38 +02:00
Miriam Baglioni
e24a7f3496
mergin with branch beta
2021-12-21 13:57:19 +01:00
Miriam Baglioni
d1ae219cb4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-21 13:55:53 +01:00
Miriam Baglioni
460e6b95d6
[Graph Dump] -
2021-12-21 13:48:03 +01:00
Sandro La Bruzzo
3920d68992
Fixed workflow generation of delta in datacite
2021-12-21 11:41:49 +01:00
Antonis Lempesis
58996972d9
added first indicator of sprint 5
2021-12-21 03:35:04 +02:00
dimitrispie
c1cdec09a9
Sprint 5 and other changes
2021-12-20 19:23:57 +02:00
Miriam Baglioni
3cc1b7b153
mergin with branch beta
2021-12-15 17:25:02 +01:00
Miriam Baglioni
63b648b0dd
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-15 12:41:15 +01:00
Antonis Lempesis
f0b523cfa7
removed the too restrctive clause. will discuss again
2021-12-15 12:32:15 +01:00
Sandro La Bruzzo
b881ee5ef8
[scholexplorer]
...
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
63952018c0
[scholexplorer]
...
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
e5bff64f2e
[scholexplorer]
...
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Claudio Atzori
1790fa2d44
Merge branch 'beta' into affiliationPropagation
2021-12-14 15:26:56 +01:00
Miriam Baglioni
56409d1281
[Dump] resolved conflicts with beta and merging
2021-12-14 15:03:45 +01:00
Miriam Baglioni
22d4b5619b
[BipFinder Result] last changes to test and resources files
2021-12-14 14:54:13 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Miriam Baglioni
573bd17cbb
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 11:12:25 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Miriam Baglioni
936578aaf1
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-13 15:01:47 +01:00
Miriam Baglioni
8d755cca80
-
2021-12-13 15:01:40 +01:00
Claudio Atzori
98eb292c59
avoid NPEs merging XMLInstance(s)
2021-12-13 13:27:20 +01:00
Claudio Atzori
5e17247bb6
avoid NPEs merging XMLInstance(s)
2021-12-13 11:48:40 +01:00
Claudio Atzori
b70ecccea0
avoid NPEs merging XMLInstance(s)
2021-12-12 12:37:38 +01:00
Claudio Atzori
c1b6ae47cd
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:47:41 +01:00
Claudio Atzori
eb43eda42a
Merge branch 'beta' into graph_cleaning
2021-12-09 16:46:48 +01:00
Claudio Atzori
41c70c607d
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:44:28 +01:00
Alessia Bardi
cba63e9f82
Merge branch 'beta' into sygma_indexing
2021-12-09 15:52:16 +01:00
Alessia Bardi
e53228401b
style
2021-12-09 15:46:22 +01:00
Claudio Atzori
cd9c51fd7a
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 14:49:24 +01:00
Claudio Atzori
e6e177dda0
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 13:57:53 +01:00
Alessia Bardi
6b5d7688a4
#7275 serialize license information in XML records
2021-12-09 13:46:48 +01:00
Miriam Baglioni
b113586207
resolved conflicts
2021-12-07 10:16:14 +01:00
Sandro La Bruzzo
5d51b3dd4a
Merge pull request 'scala_refactor' ( #169 ) from scala_refactor into beta
...
Reviewed-on: D-Net/dnet-hadoop#169
2021-12-06 15:33:44 +01:00
Miriam Baglioni
d9836f0cf3
[OpenCitations] fixed test when executed one after the other
2021-12-06 15:27:09 +01:00
Miriam Baglioni
d1df01ff1e
[Graph Dump] fixed resource for test
2021-12-06 15:15:48 +01:00
Sandro La Bruzzo
ed0c352799
[test-fixing] fixed wrong test
2021-12-06 15:07:41 +01:00
Miriam Baglioni
96a7d46278
[Graph Dump] fixed tests
2021-12-06 15:06:32 +01:00
Sandro La Bruzzo
e9f285ec4d
[scala-refactor] Module dhp-doiboost:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 14:24:03 +01:00
Sandro La Bruzzo
bf880e2508
[scala-refactor] Module dhp-graph-mapper:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Claudio Atzori
08795cbd30
using helper method from ModelSupport to find the inverse relation descriptor
2021-12-06 10:39:56 +01:00
Miriam Baglioni
f430688ff7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-03 12:36:08 +01:00
Miriam Baglioni
4bb1d43afc
-
2021-12-03 12:35:51 +01:00
Sandro La Bruzzo
f7011b90d8
format code
2021-12-03 11:15:09 +01:00
Claudio Atzori
dd0b2e5244
Merge branch 'beta' into instance_group_by_url
2021-12-03 09:27:58 +01:00
Claudio Atzori
863a2f9db3
avoid to filter OAF records defined as invisible = true
2021-12-03 09:08:12 +01:00
Claudio Atzori
9cac283bec
implemented Instance serialization features requested in https://support.openaire.eu/issues/7156
2021-12-02 17:20:33 +01:00
Miriam Baglioni
d9f80488cc
[GRAPH DUMP] Add one more test to check the filtering of the relations
2021-12-02 14:15:19 +01:00
Miriam Baglioni
58bc3f223a
[GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass
2021-12-02 14:09:46 +01:00
Miriam Baglioni
8905a39bf3
mergin with branch beta
2021-12-02 13:17:29 +01:00
Miriam Baglioni
87eedad898
-
2021-12-02 13:17:19 +01:00
Claudio Atzori
3b19821f3c
added stats computation on the graph hive DB tables
2021-12-02 10:44:10 +01:00
Claudio Atzori
cfa4560769
minor: fixed hive action name
2021-12-02 10:43:36 +01:00
Claudio Atzori
d85af6fc25
[cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties
2021-12-01 15:49:15 +01:00
Claudio Atzori
4fe7888817
code formatting
2021-12-01 15:48:15 +01:00
Claudio Atzori
01e5e0142a
added test to verify the relation inverse lookup operation
2021-12-01 09:46:26 +01:00
Claudio Atzori
0df9574a6f
Merge pull request '[stats wf] Added sprint 3&4 of indicators' ( #166 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#166
2021-11-29 10:40:26 +01:00
Claudio Atzori
1de881b796
resolved conflicts for #165
2021-11-26 16:15:11 +01:00
Claudio Atzori
014e872ae1
[resolution wf] added optional parameter to skip the entity resolution
2021-11-26 15:38:56 +01:00
Claudio Atzori
5c6d328537
code formatting
2021-11-26 15:38:16 +01:00
dimitrispie
09fc2afdca
Added indi_funder_country_collab
...
Kept only indi_pub_has_cc_licence
2021-11-26 16:13:10 +02:00
Antonis Lempesis
0b4163ee0b
added sprint3,4, removed 2, chaos
2021-11-26 15:58:01 +02:00
dimitrispie
29f69f2f89
Sprint 4
2021-11-26 15:22:04 +02:00
Miriam Baglioni
ac07ed8251
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-25 12:32:58 +01:00
Miriam Baglioni
5fd0e610bf
[DOIBOOST Process] fix filtering to filter results with non null id
2021-11-25 12:10:45 +01:00
Sandro La Bruzzo
feea154e89
remove working dir after test
2021-11-25 11:02:38 +01:00
Sandro La Bruzzo
028a8acad8
add test resources
2021-11-25 10:54:47 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Miriam Baglioni
3f9b2ba8ce
[Hosted By Map] fix issue in test
2021-11-22 16:59:43 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Sandro La Bruzzo
483d3039d1
entity resolution: added distcpt of missing entities in graph materialization
2021-11-22 15:55:24 +01:00
Sandro La Bruzzo
93fe8ce8b2
entity resolution: fix test
2021-11-22 15:50:43 +01:00
Sandro La Bruzzo
35e20b0647
updated resolution wf:
...
- generate a new version of the graph
- changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni
fdb75b180e
[Cleaning] added couple of tests for DOIBOOST publications
2021-11-21 16:35:22 +01:00
Miriam Baglioni
0506fa2654
[Graph Dump] changed to mirror the changes in the model
2021-11-19 15:56:25 +01:00
Sandro La Bruzzo
3426451d3f
Merge remote-tracking branch 'origin/beta' into beta
2021-11-19 14:49:04 +01:00
Sandro La Bruzzo
4542a2338b
updated site configuration to deploy on website
2021-11-19 13:44:08 +01:00
Claudio Atzori
e5a2c596b2
Merge branch 'beta' into preserve_openorg_parent_child_relations
2021-11-19 11:35:46 +01:00
Claudio Atzori
f4538f3c4c
cleanup
2021-11-19 11:33:10 +01:00
Claudio Atzori
2b46b87f56
fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs
2021-11-19 11:30:29 +01:00
Miriam Baglioni
9fae872181
[Graph Dump] changed to mirror the changes in the model
2021-11-19 11:25:50 +01:00
Sandro La Bruzzo
fc03c99805
fixed javadocs url after deploying site
2021-11-19 10:46:33 +01:00
Sandro La Bruzzo
0c0d561bc4
added public class into tests to create correct javadoc
2021-11-19 09:54:22 +01:00
Claudio Atzori
62fa61f3cf
merge from beta
2021-11-19 09:23:42 +01:00
Claudio Atzori
bd9a43cefd
Revert to 4094f2bb9a
2021-11-19 09:20:43 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
3974fa7dc1
Merge branch 'beta' into affiliationPropagation
2021-11-18 18:06:26 +01:00
Claudio Atzori
a24b9f8268
[dedup] trivial refactoring
2021-11-18 17:12:02 +01:00
Claudio Atzori
c0750fb17c
avoid non necessary count operations over large spark datasets
2021-11-18 17:11:31 +01:00
Claudio Atzori
bb5dca7979
cleanup
2021-11-18 17:10:46 +01:00
Miriam Baglioni
793b5a8e5f
Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
...
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni
5dc5792722
[Graph Dump] Change test resource to mirror the movement of the measure element
2021-11-18 14:39:12 +01:00
Miriam Baglioni
0136a8c266
[Graph Dump] Change test to mirror that measure is at the level of the isntance
2021-11-18 14:38:33 +01:00
Miriam Baglioni
1b79c0ee79
mergin with branch beta
2021-11-18 11:01:00 +01:00
Antonis Lempesis
cb3adb90f4
Merge branch 'beta' into beta
2021-11-17 14:33:45 +01:00
Antonis Lempesis
c283406829
added Universidad Polytecnica de Madrid
2021-11-17 15:33:00 +02:00
Claudio Atzori
e0395719d7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-17 14:17:27 +01:00
Claudio Atzori
82a4e4efae
[cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206
2021-11-17 14:17:22 +01:00
Miriam Baglioni
6d4a1c57ee
[Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved
2021-11-17 12:41:52 +01:00
Sandro La Bruzzo
9c82d670b8
make class public in order to create javadoc
2021-11-17 12:31:02 +01:00
Sandro La Bruzzo
1f5ee116ed
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
...
fixed test
2021-11-17 12:23:52 +01:00
Sandro La Bruzzo
2fd9ceac13
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:35:22 +01:00
Sandro La Bruzzo
2506d7a679
Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:07:24 +01:00
Sandro La Bruzzo
cded363b55
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:06:35 +01:00
Miriam Baglioni
4094f2bb9a
added integration md file
2021-11-17 10:04:52 +01:00
Miriam Baglioni
ec8b0219ff
[Documentation] Added first page for Integration via unresolved entities generation
2021-11-16 17:41:34 +01:00
Miriam Baglioni
2bbece2ca5
mergin with branch beta
2021-11-16 16:35:40 +01:00
Sandro La Bruzzo
2d67020c59
added dhp-enrichment maven site template
2021-11-16 16:01:08 +01:00
Miriam Baglioni
28ea532ece
[Affilaition Propagation] moved the selection of graph relation as a preparation step
2021-11-16 15:24:19 +01:00
Sandro La Bruzzo
18c1d70ef4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-16 15:16:49 +01:00
Sandro La Bruzzo
a1cafaf2e3
added mvn site for dnet-hadoop project
2021-11-16 15:16:28 +01:00
Miriam Baglioni
7c96e3fd46
removed not useful dir
2021-11-16 13:57:26 +01:00
Miriam Baglioni
c7c0c3187b
[AFFILIATION PROPAGATION] Applied some SonarLint suggestions
2021-11-16 13:56:32 +01:00
Miriam Baglioni
c6a9f0a1a8
mergin with branch beta
2021-11-16 12:04:40 +01:00
Miriam Baglioni
99d86134f5
[Graph Dump] changed the dump since the measures have been moded at the level of the instance
2021-11-16 12:04:21 +01:00
Claudio Atzori
0a727d325d
[dedup] increased number of partitions in the consistency phase
2021-11-16 08:43:41 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Claudio Atzori
668ac25224
[graph resolution] using existing argument parser file name
2021-11-15 17:02:45 +01:00
Claudio Atzori
7d0a03f607
[graph resolution] minor
2021-11-15 14:45:54 +01:00
Claudio Atzori
941a50a2fc
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-15 14:42:49 +01:00
Claudio Atzori
7c804acda8
[graph resolution] minor
2021-11-15 14:42:43 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Claudio Atzori
d2c787d416
[graph resolution] fixed sequence of the workflow steps
2021-11-15 14:31:15 +01:00
Claudio Atzori
975b10b711
[actionmanager] increased spark.sql.shuffle.partitions to 5000
2021-11-15 12:31:45 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
6595135a1a
[Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type
2021-11-12 11:45:38 +01:00
Miriam Baglioni
43cae4ad88
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-11-12 11:36:54 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
ffb0ce1d59
merge with beta - resolved conflict in pom
2021-11-12 10:19:59 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Sandro La Bruzzo
3469cc2b1d
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-12 09:56:52 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Miriam Baglioni
b8bdabfae9
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:16:48 +01:00
Miriam Baglioni
e5498052e8
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:14:10 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Antonis Lempesis
26f086dd64
removed the too restrctive clause. will discuss again
2021-11-11 12:57:19 +02:00
Claudio Atzori
148289150f
Merge branch 'beta' into doiboost_url
2021-11-11 10:40:19 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
6d3c4c4abe
mergin with branch beta
2021-11-11 08:59:53 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Miriam Baglioni
c371b23077
-
2021-11-10 17:00:37 +01:00
Alessia Bardi
fc8fceaac3
create direct link to WT projects as well
2021-11-10 14:11:52 +01:00
Alessia Bardi
6cd91004e3
fixed DOI for Wellcome Trust in mapping relationships from Crossref
2021-11-09 12:22:57 +01:00
Miriam Baglioni
9e214ce0eb
[BypassAS] addition of OC relations
2021-11-09 12:07:19 +01:00
Alessia Bardi
b9d4f115cc
fixed Crossref mappign for SFI projects
2021-11-09 12:04:45 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
6f7ca539c6
[BypassAS] update of results for bipFinder and FOS
2021-11-09 11:25:41 +01:00
Miriam Baglioni
a7d50c499b
[BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores
2021-11-08 16:44:19 +01:00
Antonis Lempesis
91354c6068
- fetching all context related results
...
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Claudio Atzori
9cb8e4ad21
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-08 09:40:24 +01:00
Miriam Baglioni
4c70201412
mergin with branch beta
2021-11-05 12:29:56 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Claudio Atzori
5681e89544
Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json'
2021-11-05 12:18:24 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
c10ff6928c
[Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity
2021-11-05 11:36:21 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
Miriam Baglioni
df7ee77c7a
[DOIBoost Mapping] removed not needed comments
2021-11-04 16:24:07 +01:00
Miriam Baglioni
de63d29b6f
[DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id)
2021-11-04 16:16:40 +01:00
Miriam Baglioni
d50057b2d9
[DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi
2021-11-03 16:59:37 +01:00
Miriam Baglioni
edf55395e9
added test resourse
2021-11-03 16:49:30 +01:00
Miriam Baglioni
d97ea82a29
[DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi
2021-11-03 16:45:15 +01:00
Miriam Baglioni
96769b4481
[DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed
2021-11-03 16:43:36 +01:00
Miriam Baglioni
683fe093cf
[DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record
2021-11-03 15:51:26 +01:00
Miriam Baglioni
b2bb8d9d79
[DOIBoost - Mapping] selecting the url from Crossref containing the doi
2021-11-03 15:44:57 +01:00
Miriam Baglioni
779318961c
[DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url
2021-11-03 14:38:52 +01:00
Miriam Baglioni
2480e590d1
[DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis
2021-11-03 14:25:23 +01:00
Miriam Baglioni
b9d124bb7c
[Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation)
2021-11-03 13:55:37 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Antonis Lempesis
b97b78f874
removed hardcoded reference
2021-11-02 09:12:49 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Miriam Baglioni
2aca6bfa0a
mergin with branch beta
2021-10-29 11:20:45 +02:00
Miriam Baglioni
09f36cffb8
[Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation
2021-10-29 11:20:03 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni
d0ef7d91c5
adding test resource
2021-10-26 17:34:11 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
cece432adc
[stats] reducing the step22 wait time
2021-10-20 14:16:33 +02:00
Antonis Lempesis
a7376907c2
invalidating medatadata before context thingies
2021-10-20 14:16:25 +02:00
Antonis Lempesis
43f4eb492b
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-20 14:16:11 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Miriam Baglioni
652114c641
[affiliationPropagation] first try. preparetion
2021-10-20 11:44:23 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Michele Artini
c4fce785ab
fixed a compilation problem of a unit test
2021-10-19 16:18:26 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Claudio Atzori
172363e7f1
[broker] integrating PR#147, notification record creation phase separated from indexing on ES
2021-10-19 15:56:27 +02:00
Claudio Atzori
bdffa86c2f
undo last commit
2021-10-19 15:39:38 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
e471f12d5e
hotfix: recovered implementation removing the hardcoded working_dirs
2021-10-19 12:35:38 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Miriam Baglioni
76d41602be
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-18 10:53:22 +02:00
Miriam Baglioni
46f82c7c8f
removed not needed folder deletion
2021-10-18 10:53:16 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Antonis Lempesis
41ecb1eb61
invalidating medatadata before context thingies
2021-10-15 13:42:55 +03:00
Antonis Lempesis
4b7c8dff2d
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-14 18:53:35 +03:00
Claudio Atzori
e15a1969a5
applying fix on the DOIBoost construction process that somehow wasn't part of the merge done in 83c90c7180
2021-10-14 14:33:56 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Claudio Atzori
14fbf92ad6
Merge branch 'beta' into beta_solr_config
2021-10-14 11:08:44 +02:00
Miriam Baglioni
4b1920f008
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-14 09:18:09 +02:00
Miriam Baglioni
8db39c86e2
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-14 09:17:39 +02:00
Claudio Atzori
b292e4a700
[stats wf] added extra logging in the context data retrieval phase
2021-10-13 17:31:53 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
Miriam Baglioni
5d9cc2452d
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-13 15:33:50 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
16b28494a9
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-13 11:34:24 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Claudio Atzori
2f61054cd1
code formatting
2021-10-11 18:29:42 +02:00
Claudio Atzori
83c90c7180
manually merging PR#149 D-Net/dnet-hadoop#149
2021-10-11 18:27:05 +02:00
Serafeim Chatzopoulos
201ce71cc1
Add resultsubject, relprojectname and resultacceptanceyear to __all field
2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos
e468a7b96b
Add tests to query Solr with different configurations
2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos
de81007302
Add exploreTestConfig, a new Solr configuration folder
2021-10-08 16:54:56 +03:00
Sandro La Bruzzo
8f99d2af86
Make the node of doiBoost to point to the correct OpenAire Organization in relations
2021-10-08 08:35:12 +02:00
Alessia Bardi
c48c43fa9e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-07 17:30:53 +02:00
Alessia Bardi
8d3b60f446
test for patching records for EOSC Future
2021-10-07 17:30:45 +02:00
miconis
611ca511db
set configuration property in openorgs duplicates wf
2021-10-07 15:39:55 +02:00
miconis
9646b9fd98
implementation of the http call for the update of openorgs suggestions
2021-10-07 11:29:11 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Michele Artini
d6e1f22408
max numbers of workers for indexing
2021-10-05 15:09:18 +02:00
Michele Artini
210d6c0e6d
generateNotificationsJob and indexNotificationsJob
2021-10-05 13:57:46 +02:00
Michele Artini
69008e20c2
log and tests
2021-10-05 11:58:20 +02:00
Sandro La Bruzzo
f258bbb927
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-10-05 10:21:50 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Claudio Atzori
cb7efe12ac
Merge pull request 'beta' ( #146 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#146
2021-10-05 10:09:37 +02:00
Michele Artini
8bbaa17335
reimplemented of conditions cache as a non static variable
2021-10-05 09:20:37 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
Michele Artini
0a9ef34b56
test
2021-10-04 15:46:12 +02:00
Michele Artini
31a6ad1d79
optimization of verifySubsriptions()
2021-10-04 12:01:56 +02:00
dimitrispie
3f25d2efb2
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2021-10-01 16:03:48 +03:00
dimitrispie
13687fd887
Sprint 3 indicators update
2021-10-01 16:02:02 +03:00
Miriam Baglioni
9814c3e700
mergin with branch beta
2021-10-01 13:00:03 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
b01cd521b0
removed configuration specifying the limit to 8 for spark.dynamicAllocation.maxExecutors
2021-10-01 11:26:33 +02:00
Claudio Atzori
ec94cc9b93
IndexNotificationsJob test: persist contents on HDFS instead of passing them to ES
2021-10-01 09:41:27 +02:00
Claudio Atzori
60a6a9a583
[graph2hive] added field 'measures' to the result view
2021-09-30 09:27:26 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Sandro La Bruzzo
477cb10715
Merge remote-tracking branch 'origin/beta' into beta
2021-09-27 16:57:23 +02:00
Sandro La Bruzzo
be79d74e3d
Fixed DoiBoost generation to point to correct organization in affiliation relation
2021-09-27 16:57:04 +02:00
Claudio Atzori
474117c2e8
Merge branch 'beta' into dedup_whitelist
2021-09-27 16:41:25 +02:00
Miriam Baglioni
476a4708d6
mergin with branch beta
2021-09-27 16:02:32 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Claudio Atzori
a53acfbc06
Merge pull request '[stats] updates in the mapping, indicators, wf' ( #145 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#145
2021-09-27 15:59:54 +02:00
Alessia Bardi
b924276e18
tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject
2021-09-24 17:11:56 +02:00
Antonis Lempesis
a1e1cf32d7
fixed an impala error
2021-09-24 12:57:24 +03:00
Antonis Lempesis
f358cabb2b
fixed typo
2021-09-22 21:50:37 +03:00
Miriam Baglioni
eedf7c3310
mergin with branch beta
2021-09-22 15:18:34 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
7fa60e166e
Merge branch 'beta' into dedup_whitelist
2021-09-22 11:31:18 +02:00
Antonis Lempesis
421d55265d
created hive action for observatory queries
2021-09-21 03:07:58 +03:00
Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
0c74f5667e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-09-20 18:12:31 +02:00
miconis
853333bdde
implementation of the whitelist for similarity relations
2021-09-20 16:21:47 +02:00
Antonis Lempesis
8b681dcf1b
attempt to make the observatory wf run in hive
2021-09-18 00:35:14 +03:00
Antonis Lempesis
2943287d10
fixed the definition of cc_licence, part II
2021-09-16 15:59:06 +03:00
Antonis Lempesis
dd2329849f
fixed the definition of cc_licence
2021-09-16 13:50:34 +03:00
Claudio Atzori
09c2eb7f62
Merge branch 'beta' into clean_relations
2021-09-16 11:09:47 +02:00
Miriam Baglioni
e9ccdf853f
related to D-Net/dnet-hadoop#132
2021-09-15 18:44:54 +02:00
Claudio Atzori
12766bf5f2
Merge branch 'beta' into clean_relations
2021-09-15 17:18:15 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 D-Net/dnet-hadoop#140
2021-09-15 16:40:25 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Antonis Lempesis
de9bf3a161
added cc_licences and abstracts in observatory db
2021-09-14 01:29:08 +03:00
Antonis Lempesis
9b1936701c
fixed yet another typo
2021-09-13 21:07:44 +03:00
Antonis Lempesis
8fc89ae822
moved context table creation before indicators
2021-09-13 14:33:23 +03:00
Antonis Lempesis
461bf90ca6
fixed the gold_oa definition
2021-09-13 11:10:30 +03:00
Antonis Lempesis
43852bac0e
creating other::other concept for all contexts
2021-09-13 01:36:41 +03:00
Antonis Lempesis
f13cca7e83
moved dependencies of indicators before them...
2021-09-08 23:07:58 +03:00
Antonis Lempesis
c6ada217a1
fixed typo
2021-09-08 22:34:59 +03:00
Antonis Lempesis
1250ae197f
using new indicators for the definition of peerreviewed, gold, and green
2021-09-08 14:08:43 +03:00
Antonis Lempesis
ccee451dde
added indicators of sprint 2 in monitor db
2021-09-07 23:17:13 +03:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
370dddb2fa
fix bug on oai iterator that skip record cleaned
2021-09-07 11:20:41 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
d4dadf6d77
reduced max number of PID in Relatedentity
2021-09-02 14:21:24 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Alessia Bardi
3762b17f7b
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
...
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
5f8ccbc365
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-20 11:13:47 +02:00
Miriam Baglioni
882abb40e4
CrossrefDump -
2021-08-20 11:12:53 +02:00
Miriam Baglioni
45c62609af
CrossrefDump - modified because parameter file was moved
2021-08-20 11:12:31 +02:00
Miriam Baglioni
35880c0e7b
CrossrefDump - changed the wf to be able to resume from one of the steps
2021-08-20 11:11:35 +02:00
Miriam Baglioni
f3b6c392c1
CrossrefDump - moving parameter file under folder crossref_dump_reader
2021-08-20 11:10:58 +02:00
Miriam Baglioni
65822400ce
CrossrefDump - added new parameter file that was missing
2021-08-20 11:10:35 +02:00
Alessia Bardi
a053e1513c
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:32:27 +02:00
Alessia Bardi
812bd54c57
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:30:14 +02:00
Miriam Baglioni
a65d3caaea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-19 10:29:10 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
7c0c67bdd6
added mock pom
2021-08-13 17:45:53 +02:00
Claudio Atzori
82086f3422
fixed directory name
2021-08-13 17:42:14 +02:00
Claudio Atzori
bc7068106c
added crossref download oozie workflow
2021-08-13 17:19:44 +02:00
Claudio Atzori
2c0a05f11a
manually merged PR#139
2021-08-13 17:15:53 +02:00
Claudio Atzori
d43667d857
Merge pull request 'Automatic download of Crossref' ( #138 ) from crossref_dw_wf into beta
...
Reviewed-on: D-Net/dnet-hadoop#138
2021-08-13 17:10:10 +02:00
Miriam Baglioni
5856ca8a7b
merging with branch beta - resolved conflicts
2021-08-13 16:45:45 +02:00
Miriam Baglioni
6fec71e8d2
removed the specific of the infra we are running the wf from the wf name
2021-08-13 16:39:02 +02:00
Miriam Baglioni
ed7e28490a
change in sh
2021-08-13 16:19:01 +02:00
Claudio Atzori
7743d0f919
consolidated dnet wf profiles into the same submodule
2021-08-13 16:14:54 +02:00
Miriam Baglioni
6eb7508995
mergin with branch beta
2021-08-13 16:07:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
dc8b05b39e
Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph
2021-08-13 10:18:25 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00
Miriam Baglioni
5f674efb0c
moved dependency version in external pom
2021-08-13 10:07:53 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
ed183d878e
GetCSV refactoring - modified test classes due to change in the model of projects and programme
2021-08-13 09:28:51 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
d57b2bb927
GetCSV refactoring - removing not needed dependency
2021-08-12 18:12:51 +02:00
Miriam Baglioni
9da74b544a
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:12:15 +02:00
Miriam Baglioni
ab8abd61bb
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:11:07 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
86d940044c
added test to verify bad records from FWF-E-Book-Library
2021-08-12 11:32:56 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
08dd2b2102
moving the dependency version to the external pom file
2021-08-11 18:09:41 +02:00
Miriam Baglioni
ac417ca798
removed not needed test resource
2021-08-11 17:50:33 +02:00
Miriam Baglioni
e33daaeee8
reverting
2021-08-11 17:46:19 +02:00
Miriam Baglioni
785db1d5b2
refactoring
2021-08-11 17:44:07 +02:00
Miriam Baglioni
95e5482bbb
removing not needed dependency
2021-08-11 17:42:26 +02:00
Miriam Baglioni
b966329833
reverting
2021-08-11 17:37:00 +02:00
Miriam Baglioni
8ad7c71417
reverting
2021-08-11 17:36:12 +02:00
Miriam Baglioni
0e1a6bec20
reverting
2021-08-11 17:32:29 +02:00
Miriam Baglioni
c6a2a780a9
reverting
2021-08-11 17:30:17 +02:00
Miriam Baglioni
b6b58bba28
reverting
2021-08-11 17:25:37 +02:00
Miriam Baglioni
804589eb30
reverting
2021-08-11 17:23:35 +02:00
Miriam Baglioni
d688749ad9
reverting
2021-08-11 17:22:28 +02:00
Miriam Baglioni
524c06e028
reverting
2021-08-11 17:20:30 +02:00
Miriam Baglioni
7aa3260729
reverting
2021-08-11 17:18:45 +02:00
Miriam Baglioni
55fc500d8d
reverting
2021-08-11 17:17:48 +02:00
Miriam Baglioni
8229632839
adding assertions to the mapping of the unibi part of gold list
2021-08-11 16:36:01 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
52c18c2697
removed not needed test class. Teh functionality has been moved
2021-08-11 16:16:55 +02:00