Miriam Baglioni
eedf7c3310
mergin with branch beta
2021-09-22 15:18:34 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
7fa60e166e
Merge branch 'beta' into dedup_whitelist
2021-09-22 11:31:18 +02:00
Antonis Lempesis
421d55265d
created hive action for observatory queries
2021-09-21 03:07:58 +03:00
Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
0c74f5667e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-09-20 18:12:31 +02:00
miconis
853333bdde
implementation of the whitelist for similarity relations
2021-09-20 16:21:47 +02:00
Antonis Lempesis
8b681dcf1b
attempt to make the observatory wf run in hive
2021-09-18 00:35:14 +03:00
Antonis Lempesis
2943287d10
fixed the definition of cc_licence, part II
2021-09-16 15:59:06 +03:00
Antonis Lempesis
dd2329849f
fixed the definition of cc_licence
2021-09-16 13:50:34 +03:00
Claudio Atzori
09c2eb7f62
Merge branch 'beta' into clean_relations
2021-09-16 11:09:47 +02:00
Miriam Baglioni
e9ccdf853f
related to D-Net/dnet-hadoop#132
2021-09-15 18:44:54 +02:00
Claudio Atzori
12766bf5f2
Merge branch 'beta' into clean_relations
2021-09-15 17:18:15 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 D-Net/dnet-hadoop#140
2021-09-15 16:40:25 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Antonis Lempesis
de9bf3a161
added cc_licences and abstracts in observatory db
2021-09-14 01:29:08 +03:00
Antonis Lempesis
9b1936701c
fixed yet another typo
2021-09-13 21:07:44 +03:00
Antonis Lempesis
8fc89ae822
moved context table creation before indicators
2021-09-13 14:33:23 +03:00
Antonis Lempesis
461bf90ca6
fixed the gold_oa definition
2021-09-13 11:10:30 +03:00
Antonis Lempesis
43852bac0e
creating other::other concept for all contexts
2021-09-13 01:36:41 +03:00
Antonis Lempesis
f13cca7e83
moved dependencies of indicators before them...
2021-09-08 23:07:58 +03:00
Antonis Lempesis
c6ada217a1
fixed typo
2021-09-08 22:34:59 +03:00
Antonis Lempesis
1250ae197f
using new indicators for the definition of peerreviewed, gold, and green
2021-09-08 14:08:43 +03:00
Antonis Lempesis
ccee451dde
added indicators of sprint 2 in monitor db
2021-09-07 23:17:13 +03:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
370dddb2fa
fix bug on oai iterator that skip record cleaned
2021-09-07 11:20:41 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
d4dadf6d77
reduced max number of PID in Relatedentity
2021-09-02 14:21:24 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Alessia Bardi
3762b17f7b
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
...
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
5f8ccbc365
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-20 11:13:47 +02:00
Miriam Baglioni
882abb40e4
CrossrefDump -
2021-08-20 11:12:53 +02:00
Miriam Baglioni
45c62609af
CrossrefDump - modified because parameter file was moved
2021-08-20 11:12:31 +02:00
Miriam Baglioni
35880c0e7b
CrossrefDump - changed the wf to be able to resume from one of the steps
2021-08-20 11:11:35 +02:00
Miriam Baglioni
f3b6c392c1
CrossrefDump - moving parameter file under folder crossref_dump_reader
2021-08-20 11:10:58 +02:00
Miriam Baglioni
65822400ce
CrossrefDump - added new parameter file that was missing
2021-08-20 11:10:35 +02:00
Alessia Bardi
a053e1513c
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:32:27 +02:00
Alessia Bardi
812bd54c57
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:30:14 +02:00
Miriam Baglioni
a65d3caaea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-19 10:29:10 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
7c0c67bdd6
added mock pom
2021-08-13 17:45:53 +02:00
Claudio Atzori
82086f3422
fixed directory name
2021-08-13 17:42:14 +02:00
Claudio Atzori
bc7068106c
added crossref download oozie workflow
2021-08-13 17:19:44 +02:00
Claudio Atzori
2c0a05f11a
manually merged PR#139
2021-08-13 17:15:53 +02:00
Claudio Atzori
d43667d857
Merge pull request 'Automatic download of Crossref' ( #138 ) from crossref_dw_wf into beta
...
Reviewed-on: D-Net/dnet-hadoop#138
2021-08-13 17:10:10 +02:00
Miriam Baglioni
5856ca8a7b
merging with branch beta - resolved conflicts
2021-08-13 16:45:45 +02:00
Miriam Baglioni
6fec71e8d2
removed the specific of the infra we are running the wf from the wf name
2021-08-13 16:39:02 +02:00
Miriam Baglioni
ed7e28490a
change in sh
2021-08-13 16:19:01 +02:00
Claudio Atzori
7743d0f919
consolidated dnet wf profiles into the same submodule
2021-08-13 16:14:54 +02:00
Miriam Baglioni
6eb7508995
mergin with branch beta
2021-08-13 16:07:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
dc8b05b39e
Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph
2021-08-13 10:18:25 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00
Miriam Baglioni
5f674efb0c
moved dependency version in external pom
2021-08-13 10:07:53 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
ed183d878e
GetCSV refactoring - modified test classes due to change in the model of projects and programme
2021-08-13 09:28:51 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
d57b2bb927
GetCSV refactoring - removing not needed dependency
2021-08-12 18:12:51 +02:00
Miriam Baglioni
9da74b544a
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:12:15 +02:00
Miriam Baglioni
ab8abd61bb
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:11:07 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
86d940044c
added test to verify bad records from FWF-E-Book-Library
2021-08-12 11:32:56 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
08dd2b2102
moving the dependency version to the external pom file
2021-08-11 18:09:41 +02:00
Miriam Baglioni
ac417ca798
removed not needed test resource
2021-08-11 17:50:33 +02:00
Miriam Baglioni
e33daaeee8
reverting
2021-08-11 17:46:19 +02:00
Miriam Baglioni
785db1d5b2
refactoring
2021-08-11 17:44:07 +02:00
Miriam Baglioni
95e5482bbb
removing not needed dependency
2021-08-11 17:42:26 +02:00
Miriam Baglioni
b966329833
reverting
2021-08-11 17:37:00 +02:00
Miriam Baglioni
8ad7c71417
reverting
2021-08-11 17:36:12 +02:00
Miriam Baglioni
0e1a6bec20
reverting
2021-08-11 17:32:29 +02:00
Miriam Baglioni
c6a2a780a9
reverting
2021-08-11 17:30:17 +02:00
Miriam Baglioni
b6b58bba28
reverting
2021-08-11 17:25:37 +02:00
Miriam Baglioni
804589eb30
reverting
2021-08-11 17:23:35 +02:00
Miriam Baglioni
d688749ad9
reverting
2021-08-11 17:22:28 +02:00
Miriam Baglioni
524c06e028
reverting
2021-08-11 17:20:30 +02:00
Miriam Baglioni
7aa3260729
reverting
2021-08-11 17:18:45 +02:00
Miriam Baglioni
55fc500d8d
reverting
2021-08-11 17:17:48 +02:00
Miriam Baglioni
8229632839
adding assertions to the mapping of the unibi part of gold list
2021-08-11 16:36:01 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
52c18c2697
removed not needed test class. Teh functionality has been moved
2021-08-11 16:16:55 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
61d811ba53
suggestions from intellij
2021-08-11 12:18:20 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
b954fe9ba8
mergin with branch beta
2021-08-11 10:12:46 +02:00
Miriam Baglioni
b688567db5
hostedbymap - modified part of test to check the bestaccessright changed
2021-08-11 10:12:10 +02:00
Miriam Baglioni
9731a6144a
hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result
2021-08-10 17:49:45 +02:00
Miriam Baglioni
a90bac3bc9
Graph Dump - added method to test class to verify addition of validation date in projects for community result
2021-08-09 16:36:54 +02:00
Miriam Baglioni
bd0d7bfba7
Graph Dump - added resources for testing addition of validation date in project for communityresult
2021-08-09 16:36:17 +02:00
Miriam Baglioni
8daaa32e90
Graph Dump - added resources for testing
2021-08-09 15:46:29 +02:00
Miriam Baglioni
bc9e3a06ba
Graph Dump - extended the test class
2021-08-09 15:46:06 +02:00
Claudio Atzori
d64a942a76
fixed MappersTest
2021-08-09 12:32:26 +02:00
Miriam Baglioni
2efa5abda5
refactoring
2021-08-09 12:28:36 +02:00
Claudio Atzori
577f3b1ac8
added dnet workflows responsible for the graph construction, enrichment, provision
2021-08-09 11:53:58 +02:00
Miriam Baglioni
da20fceaf7
removed all the part related to the crossref dump download since it is done in a separate workflow
2021-08-09 11:53:45 +02:00
Claudio Atzori
964f97ed4d
cleanup
2021-08-09 11:53:06 +02:00
Miriam Baglioni
54a6cbb244
CrossrefDump - put token among the parameters
2021-08-09 11:41:10 +02:00
Miriam Baglioni
b7079804cb
CrossrefDump - put token among the parameters
2021-08-09 11:34:35 +02:00
Miriam Baglioni
a5f82f442b
Merge branch 'beta' into doiboost_wf
2021-08-09 11:17:51 +02:00
Miriam Baglioni
b6dcf89d22
mergin with branch beta
2021-08-09 11:14:43 +02:00
Miriam Baglioni
eff499af9f
added new tests and changed the test example
2021-08-09 11:12:30 +02:00
Claudio Atzori
a45b95ccc1
resolving conflicts for PR#134
2021-08-09 10:50:03 +02:00
Miriam Baglioni
5d70f842eb
mergin with branch beta
2021-08-06 18:57:09 +02:00
Miriam Baglioni
c3931557e3
extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects
2021-08-06 18:56:18 +02:00
Claudio Atzori
66f398fe6f
Merge pull request '[stats] fixed a typo' ( #133 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#133
2021-08-06 14:29:57 +02:00
Miriam Baglioni
6bd1eca7e0
merge branch with beta
2021-08-05 15:23:32 +02:00
Miriam Baglioni
73dc082927
added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result
2021-08-05 15:20:50 +02:00
Miriam Baglioni
ee13da9258
merge branch with master
2021-08-05 11:34:20 +02:00
Miriam Baglioni
bd096f5170
removed not needed param file
2021-08-05 10:55:43 +02:00
Miriam Baglioni
5faeefbda8
added script to download the dump,changed the workflow input paramenters
2021-08-05 10:54:03 +02:00
Miriam Baglioni
1965e4eece
new workflow for downloading the dump of crossref and unpack it
2021-08-04 18:29:03 +02:00
Claudio Atzori
83c04e5d28
mapping test for dataset records adapted to reflect the delegated pid authority (zenodo)
2021-08-04 10:37:57 +02:00
Miriam Baglioni
b4eb026c8b
mergin with branch beta
2021-08-04 10:21:37 +02:00
Miriam Baglioni
c7b71647c6
Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id
2021-08-04 10:20:02 +02:00
Miriam Baglioni
eb8c3f8594
Hosted By Map - test modified because of the application of the new aggregator on datasources
2021-08-04 10:19:17 +02:00
Miriam Baglioni
e94ae0b1de
Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources
2021-08-04 10:18:11 +02:00
Miriam Baglioni
67ba4c40e0
Hosted By Map - added parameter resources
2021-08-04 10:17:28 +02:00
Miriam Baglioni
eccf3851b0
Hosted By Map - refactoring
2021-08-04 10:16:30 +02:00
Sandro La Bruzzo
74afe43c3a
fixed wrong test file
2021-08-04 10:16:17 +02:00
Miriam Baglioni
1e952cccf6
Hosted By Map - refactoring and deletion of not needed methods
2021-08-04 10:15:43 +02:00
Miriam Baglioni
8ba8c77f92
Hosted By Map - refactoring
2021-08-04 10:14:57 +02:00
Miriam Baglioni
8f7623e77a
Hosted By Map - refactoring and application of the new aggregator
2021-08-04 10:14:20 +02:00
Sandro La Bruzzo
3fc820203b
fixed wrong test file
2021-08-04 10:13:59 +02:00
Miriam Baglioni
a7bf314fd2
Hosted By Map - added new aggregator to get just one result per datasource id
2021-08-04 10:13:30 +02:00
Miriam Baglioni
9831725073
Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals
2021-08-03 11:02:17 +02:00
Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
17292c6641
Hosted By Map - resources for testing purposes
2021-08-02 19:37:08 +02:00
Miriam Baglioni
ee7ccb98dc
Hosted By Map - test class to verify the application of the hbm to results and datasource
2021-08-02 19:36:18 +02:00
Miriam Baglioni
90e91486e2
Hosted By Map - test class to verify each step in the preparation process
2021-08-02 19:35:52 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Antonis Lempesis
117c3d5c67
fixed a typo
2021-08-02 12:15:58 +03:00
Miriam Baglioni
1695d45bd4
Hosted By Map - Test class to verify the preparation of the intermediate information
2021-07-30 17:57:01 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
4f78565c04
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:07:09 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Miriam Baglioni
9bc4fd3b69
Patch FCT relations - fixed issue with join
2021-07-30 10:34:05 +02:00
Miriam Baglioni
2fc89fc9b5
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:20:43 +02:00
Claudio Atzori
081fe92a21
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:13:56 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Claudio Atzori
55e6470f44
Merge pull request 'added the sprint 2 indicators in monitor db' ( #129 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo
6358f92c3a
added sleep to solve problem of lost request of creating index
2021-07-30 08:54:37 +02:00
Antonis Lempesis
26af0320d0
added the sprint 2 indicators in monitor db
2021-07-30 00:31:33 +03:00
Claudio Atzori
7b172e7cd9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-29 13:57:06 +02:00
Claudio Atzori
c53d106e80
[provision] lowercase relation filter
2021-07-29 13:57:00 +02:00
Claudio Atzori
6e3554a45e
[provision] lowercase relation filter
2021-07-29 13:56:37 +02:00
Sandro La Bruzzo
b1b0cc3f15
fixed wrong package name
2021-07-29 13:55:08 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e725c88ebb
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:43 +02:00
Claudio Atzori
5d08ad86ae
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:16 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00
Claudio Atzori
5f7330d407
Merge branch 'master' into fct_project_id_replacement
2021-07-29 11:38:22 +02:00
Claudio Atzori
1923c1ce21
replaced full join + filtering with a left join
2021-07-29 11:36:20 +02:00
Claudio Atzori
dc55ed4acd
Merge pull request '[beta] stats update workflow' ( #128 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#128
2021-07-29 11:13:21 +02:00
Claudio Atzori
908f57a475
code formatting
2021-07-29 10:49:39 +02:00
Sandro La Bruzzo
3721df7aa6
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
2021-07-29 10:45:35 +02:00
Antonis Lempesis
4afa5215a9
fixed a NPE?
2021-07-28 21:59:12 +03:00
Antonis Lempesis
3d1580fa9b
fixed a typo
2021-07-28 18:50:31 +03:00
Claudio Atzori
4c5a71ba2f
[broker] updated relation descriptors, making use of constant values
2021-07-28 17:11:18 +02:00
Claudio Atzori
a9961a1835
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:36:33 +02:00
Claudio Atzori
e1797c0a42
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-28 16:21:36 +02:00
Claudio Atzori
6dddad86ee
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:21:29 +02:00
Sandro La Bruzzo
3d8f0f629b
implemented workflow of creation action set for scholexplorer
2021-07-28 16:15:34 +02:00
Antonis Lempesis
9b181ffa73
added the h2020 classification scheme for projects
2021-07-28 16:31:29 +03:00
Alessia Bardi
df8715a1ec
format code after mvn compile
2021-07-28 11:58:26 +02:00
Michele Artini
3e2a2d6e71
added new fields in xml
2021-07-28 11:56:55 +02:00
Alessia Bardi
c806387d4b
tests for enermaps
2021-07-28 11:54:36 +02:00
Alessia Bardi
9594343725
code formatting after mvn compile
2021-07-28 11:41:34 +02:00
Claudio Atzori
2fff24df55
code formatting
2021-07-28 11:34:19 +02:00
Michele Artini
9f1c7b8e17
tests
2021-07-28 11:32:34 +02:00
Antonis Lempesis
4a9741825d
added result_orcid, result_project provenance, issn in datasources
2021-07-28 12:28:04 +03:00
Miriam Baglioni
3d2bba3d5d
removing not needed classes
2021-07-28 11:25:43 +02:00
Miriam Baglioni
cc0d3d8a7b
mergin with branch beta
2021-07-28 11:24:46 +02:00
Michele Artini
e6f1773d63
mapping of new eosc fields
2021-07-28 11:17:11 +02:00
Miriam Baglioni
80d5b3b4de
DoiBoost AccessRigh #4362 - removing commented code
2021-07-28 11:16:49 +02:00
Miriam Baglioni
5fe016dcbc
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194
2021-07-28 11:14:28 +02:00
Miriam Baglioni
73ed7374a9
mergin with branch beta
2021-07-28 11:05:16 +02:00
Miriam Baglioni
43e62fcae9
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193
2021-07-28 11:04:55 +02:00
Michele Artini
c72c960ffb
added eosc fields
2021-07-28 11:03:15 +02:00
Michele Artini
1fb572a33a
added eosc fields
2021-07-28 10:52:24 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Sandro La Bruzzo
16c91203bd
implemented workflow of creation action set for scholexplorer
2021-07-28 10:30:49 +02:00
Miriam Baglioni
6c936943aa
mergin with branch beta
2021-07-28 10:24:48 +02:00
Miriam Baglioni
0424f47494
HostedByMap fixing issues
2021-07-28 10:24:13 +02:00
Michele Artini
52e2315ba2
removed trick for datasourcetypeui
2021-07-28 10:23:00 +02:00
Claudio Atzori
d267dce520
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-27 17:18:29 +02:00
Sandro La Bruzzo
825d9f0289
fixed datacite workflow starting from Importing delta
2021-07-27 16:09:46 +02:00
Claudio Atzori
5aa7d16d1b
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:58 +02:00
Claudio Atzori
998b66855a
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:37 +02:00
Antonis Lempesis
1a28a69cac
changed the citeee in *_citations to cites
2021-07-27 15:14:09 +03:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
35e395eae8
merge with master
2021-07-27 12:34:59 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Antonis Lempesis
ed185fd7ed
added missing colons
2021-07-27 11:42:47 +03:00
Antonis Lempesis
f3b9570354
properly invalidating metadata
2021-07-26 13:00:16 +03:00
Sandro La Bruzzo
848aabbb6c
minor fix
2021-07-25 12:06:41 +02:00
Sandro La Bruzzo
8fac10c91e
fixed defintion wf of creation final infospace of scholexplorer
2021-07-25 11:15:37 +02:00
Sandro La Bruzzo
3920c69bc8
change implementation of resolve Relation to generate jsonRdd in output
2021-07-25 09:51:36 +02:00
Antonis Lempesis
f9fbb0f261
added indicators second sprint
2021-07-24 16:40:28 +03:00
Claudio Atzori
a0393607a7
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:15:08 +02:00
Claudio Atzori
5b6844b969
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:14:37 +02:00
Sandro La Bruzzo
d9e3b89937
implemented last part of workflows to generate scholixGraph
2021-07-23 16:38:32 +02:00
Sandro La Bruzzo
cfde63a7c3
fixed resolve relation join
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
4a439c3863
NPE fixed
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
ca74e8dd02
create a separate wf for resolving relation
2021-07-23 11:40:06 +02:00
Sandro La Bruzzo
43e9380cd3
update resolve relation to use the same format of openaire graph
2021-07-23 11:25:18 +02:00
Sandro La Bruzzo
058b636d4d
added control to check if the entity exists
2021-07-22 16:08:54 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Miriam Baglioni
1a5b114906
DoiBoost AccessRigh #4362 - refactoring
2021-07-22 12:00:23 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Miriam Baglioni
b226ba4439
mergin with branch beta
2021-07-21 09:46:40 +02:00
Alessia Bardi
9069958479
tests for enermaps
2021-07-20 19:31:43 +02:00
Claudio Atzori
10d7b4f0b4
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:52:05 +02:00
Claudio Atzori
77e8c6c7f7
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:51:33 +02:00
Miriam Baglioni
83fe31c92e
changed the name of the workflows
2021-07-19 18:19:14 +02:00
Miriam Baglioni
dd81c36b60
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-19 18:18:14 +02:00
Miriam Baglioni
54acc5373b
changed the name of the workflows
2021-07-19 18:18:09 +02:00
Miriam Baglioni
b420b11ed3
duplicate the number of partitions in ProcessMag
2021-07-19 18:16:23 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00
Claudio Atzori
0977baf41d
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 17:43:52 +02:00
Claudio Atzori
5e5f65a3c3
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 15:56:55 +02:00
Miriam Baglioni
662c396354
duplicate the number of partitions in ConvertCrossrefToOaf
2021-07-19 12:41:14 +02:00
Miriam Baglioni
59530a14fb
DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator
2021-07-19 12:34:35 +02:00
Miriam Baglioni
199123b74b
DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource
2021-07-16 17:30:27 +02:00
Miriam Baglioni
c4b18e6ccb
changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps
2021-07-16 15:01:25 +02:00
Miriam Baglioni
acd6056330
added shell action to automatically download the new dump and put it in a specified hdfs location
2021-07-16 12:47:10 +02:00
Miriam Baglioni
3bc9a05bc9
mergin with branch beta
2021-07-16 10:32:27 +02:00
Miriam Baglioni
34506df1b6
DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps)
2021-07-16 10:29:51 +02:00
Claudio Atzori
bf9e0d2d4f
Merge pull request 'orcid-no-doi' ( #123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
...
Reviewed-on: D-Net/dnet-hadoop#123
2021-07-15 17:59:41 +02:00
Sandro La Bruzzo
7e2caafe84
Scholexplorer: fixed mapping typologies
2021-07-15 09:53:12 +02:00
Enrico Ottonello
2dc50c0999
added default value to process path
2021-07-14 17:02:22 +02:00
Enrico Ottonello
66604bb2b4
added absolute path to process folder
2021-07-14 16:44:51 +02:00
Enrico Ottonello
7840cc6526
merged with master
2021-07-14 15:33:59 +02:00
Miriam Baglioni
4da46bb62f
mergin with branch beta
2021-07-14 15:08:52 +02:00
Enrico Ottonello
a65667d217
added publication to dataset even if no contributors
2021-07-14 15:07:07 +02:00
Sandro La Bruzzo
10068c00ea
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:45:50 +02:00
Miriam Baglioni
09ad7b2a9e
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:45:21 +02:00
Miriam Baglioni
f4f7c6f9d3
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:44:54 +02:00
Miriam Baglioni
6222adf176
DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included)
2021-07-14 14:42:34 +02:00
Miriam Baglioni
981b1018f6
DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown
2021-07-14 14:42:06 +02:00
Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni
441701c85c
DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor'
2021-07-14 14:14:50 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Miriam Baglioni
1cdd09cd8e
Tentative fix for testing of Jenkins
2021-07-14 11:14:59 +02:00
Sandro La Bruzzo
4cb65bc64a
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 09:44:32 +02:00
Miriam Baglioni
774cdb190e
changes to mirror the last dump of the graph with the ols data model.
2021-07-13 18:57:24 +02:00
Miriam Baglioni
886617afd0
One result linked to more than on project is saved just once
2021-07-13 18:15:35 +02:00
Miriam Baglioni
320cf02d96
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
2021-07-13 18:13:32 +02:00
Miriam Baglioni
52ce35d57b
-
2021-07-13 18:08:46 +02:00
Miriam Baglioni
970b387b8d
modification to allow dump of a single community
2021-07-13 18:08:10 +02:00
Miriam Baglioni
eae10c5894
modification to allow the dump for a single community
2021-07-13 18:07:25 +02:00
Miriam Baglioni
c028feef4f
workflow for the dump as sub workflows
2021-07-13 18:06:44 +02:00
Miriam Baglioni
d70f8c96fd
funding contains and not starts with h2020
2021-07-13 17:34:53 +02:00
Miriam Baglioni
5e38c7f42d
dumping only communities with status all
2021-07-13 17:32:38 +02:00
Claudio Atzori
734de62474
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 17:26:04 +02:00
Miriam Baglioni
618d2de2da
minor changes and refactoring
2021-07-13 17:10:02 +02:00
Miriam Baglioni
59615da65e
Add test to verify the creation of relation between context and projects
2021-07-13 17:09:15 +02:00
Miriam Baglioni
084b4ef999
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
2021-07-13 17:07:46 +02:00
Claudio Atzori
fa720c1da4
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 16:59:30 +02:00
Miriam Baglioni
8f322a73cb
change because of the renaming of originalId in acronym
2021-07-13 16:22:58 +02:00
Miriam Baglioni
72397ea1ba
Added fix for community of arbitrary name length
2021-07-13 16:18:35 +02:00
Miriam Baglioni
5295d10691
added check not to dump deletedByInference entities
2021-07-13 16:11:46 +02:00
Claudio Atzori
9629569e22
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-07-13 16:04:08 +02:00
Claudio Atzori
f13e11e3f7
[aggregation] datacite wf: defined parameter declaring the path used to store the OAF objects produced by the transformation phase
2021-07-13 16:04:02 +02:00
Miriam Baglioni
e9a17ec899
added check to verify not to add void APC
2021-07-13 15:53:35 +02:00
Miriam Baglioni
8429aed6c6
Added resource for testing selection of valid relations
2021-07-13 15:49:38 +02:00
Miriam Baglioni
39b1a6edf6
added test class for the selection of valid relations and description
2021-07-13 15:23:09 +02:00
Miriam Baglioni
9a58f1b93d
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
2021-07-13 15:20:39 +02:00
Miriam Baglioni
13c66e16be
changed logic to split for communities
2021-07-13 15:15:27 +02:00
Miriam Baglioni
6410ab71d8
added APC in the dump and test method
2021-07-13 15:13:58 +02:00
Miriam Baglioni
65a242646d
added resource for APC dump
2021-07-13 14:45:25 +02:00
Miriam Baglioni
4b432fbee8
extended test class
2021-07-13 14:40:39 +02:00
Miriam Baglioni
87a6e2b967
extended test class
2021-07-13 14:38:28 +02:00
Miriam Baglioni
69fd40fd30
modified code to split the Croatian funder
2021-07-13 14:35:26 +02:00
Miriam Baglioni
86e50f7311
modified code to split the Croatian funder
2021-07-13 14:31:45 +02:00
Miriam Baglioni
da88c850c6
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:44 +02:00
Miriam Baglioni
2f66fedfec
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:23 +02:00
Miriam Baglioni
f5486ffb14
Fixed issues to tests
2021-07-13 14:07:45 +02:00
Claudio Atzori
e0061232e9
[aggregation] datacite wf: conditional creation of links, optional resume from intermediate phases
2021-07-13 13:41:21 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Claudio Atzori
ae2b47b29d
[broker] added coalesce(1) on the stats dataset before storing it on postgres
2021-07-09 15:47:51 +02:00
Sandro La Bruzzo
57c74c73c6
fixed mistakes in oozie workflow
2021-07-09 12:28:09 +02:00
Sandro La Bruzzo
61ccb54fde
removed wrong loop on oozie wf
2021-07-09 12:17:57 +02:00
Sandro La Bruzzo
9f5a0f3ab6
moved wf indexing of Scholexplorer in dhp-graph-provision
2021-07-09 12:06:43 +02:00
Sandro La Bruzzo
09fccf8000
added workflow to serialize scholix and summary in json
2021-07-09 11:01:42 +02:00
Sandro La Bruzzo
0ea576745f
updated CreateInputGraph because ggenerics don't work on Spark Dataset
2021-07-09 10:29:24 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Miriam Baglioni
c30f3ce647
merge doi normalization
2021-07-08 19:20:02 +02:00
Sandro La Bruzzo
8a034e46e1
updated baseline workflow
2021-07-08 11:11:41 +02:00
Claudio Atzori
b7b8e0986e
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
2021-07-08 10:42:31 +02:00
Sandro La Bruzzo
0799ac9fb6
fixed wrong path
2021-07-08 10:36:37 +02:00
Sandro La Bruzzo
4d53402712
extended ebiLinks to create a dataset before generation of OAF
2021-07-08 10:26:21 +02:00
Sandro La Bruzzo
a4a54a3786
code refactor
2021-07-08 09:08:25 +02:00
Sandro La Bruzzo
a01dbe0ab0
completed workflow of generation of scholix and summaries
2021-07-07 23:10:34 +02:00
Claudio Atzori
fdcff42e46
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 19:01:59 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Claudio Atzori
32bdfdccbc
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 11:08:27 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00
Claudio Atzori
f580cb77e1
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
2021-07-06 21:11:11 +02:00
Sandro La Bruzzo
ed684874f2
deleted old scholix project
2021-07-06 17:20:08 +02:00
Sandro La Bruzzo
8535506c22
added scholix generation
2021-07-06 17:18:06 +02:00
Sandro La Bruzzo
4c54bd8742
add test to verify merge scholix on source
2021-07-06 11:32:14 +02:00
Andreas Czerniak
3531802710
to solve the scala SI-3623
2021-07-06 11:30:56 +02:00
Sandro La Bruzzo
7d8db2eb8a
betterRenamingMethod
2021-07-06 09:56:32 +02:00
Sandro La Bruzzo
c952c8d236
generate first side of scholix mapping
2021-07-06 09:53:14 +02:00
Claudio Atzori
70ded407bb
HttpClient used in metadata collection retries also on 404
2021-07-05 18:04:30 +02:00
Miriam Baglioni
7177c25261
added check for null value during doi normalization
2021-07-05 16:22:38 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Antonis Lempesis
89e6f46682
using organization ids instead of names in monitor db creation
2021-07-05 12:00:00 +03:00
Sandro La Bruzzo
e4b84ef5d6
fixed mapping OAF to Scholix summary
2021-07-02 16:48:48 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis
829caee4fd
added the missing indicators files
2021-06-30 17:31:33 +02:00
Sandro La Bruzzo
84b834c893
added test dataset test for pangaea
2021-06-30 17:31:09 +02:00
Sandro La Bruzzo
1a6b398968
implemented Creation of Raw Graph and Resolution
2021-06-30 17:27:55 +02:00
Miriam Baglioni
bc34347643
added assertions to verify doi normalization
2021-06-30 14:37:08 +02:00
Miriam Baglioni
86f47afcc7
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 14:36:49 +02:00
Miriam Baglioni
03767ea8e6
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 13:21:24 +02:00
Miriam Baglioni
f8eec0ca9a
added resource to test the normalization of doi during the import of MAG
2021-06-30 13:19:54 +02:00
Miriam Baglioni
149f85ddf5
added tests for the normalization of the dois
2021-06-30 13:00:52 +02:00
Miriam Baglioni
e487b5544c
added tests for the normalization of the dois
2021-06-30 12:57:11 +02:00
Miriam Baglioni
1503ccbbb5
added tests for the normalization of the dois
2021-06-30 12:55:37 +02:00
Miriam Baglioni
1299bfb357
Added class to test the normalization of doi
2021-06-30 12:53:27 +02:00
Sandro La Bruzzo
623a0c4edb
code Refactor, renaming packages
2021-06-30 11:09:30 +02:00
Miriam Baglioni
cf758f4f91
added normalization step for the doi
2021-06-30 10:03:15 +02:00
Miriam Baglioni
801763a0fa
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
2021-06-29 19:07:23 +02:00
Miriam Baglioni
a74de1cda2
added normalization step to the doi
2021-06-29 18:51:11 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8b8ffe82dc
added step of normalization for the doi
2021-06-29 18:41:39 +02:00
Miriam Baglioni
50cc21d92e
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
2021-06-29 18:35:28 +02:00
Antonis Lempesis
87f14a3899
added the missing indicators files
2021-06-29 16:31:51 +03:00
Sandro La Bruzzo
db933ebd21
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-29 14:16:12 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Sandro La Bruzzo
075055eaca
added relation dates in bio mapping
2021-06-29 10:33:09 +02:00
Sandro La Bruzzo
f36f92287d
implemented mapping from Crossref Event Data to Oaf
2021-06-29 10:21:23 +02:00
Antonis Lempesis
018c4eb52c
copied latest changes from old fork: indicators+monitor institutions
2021-06-28 23:46:52 +03:00
Sandro La Bruzzo
511ec14c63
implemented mapping from EBI and Scholix Resolved to OAF
2021-06-28 22:04:22 +02:00
Claudio Atzori
af42377d0e
HttpClient used in metadata collection retries on 502, 503, 504
2021-06-28 09:34:30 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
2e8fd2c531
cleanup
2021-06-23 14:38:24 +02:00
Claudio Atzori
4dc9ebf217
[raw_all] fixed unit test
2021-06-23 14:38:07 +02:00
Claudio Atzori
50fc5a64a0
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-06-23 11:49:42 +02:00
Claudio Atzori
5edcc6832a
applying sonarLint suggestions
2021-06-23 09:53:29 +02:00
Sandro La Bruzzo
080a280bea
added pdb to Oaf Transformation
2021-06-21 16:23:59 +02:00
Sandro La Bruzzo
1dc0c59e20
merged fix thai dates from stable_ids
2021-06-21 10:39:46 +02:00
Sandro La Bruzzo
dc66cf615b
Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:38:33 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
a167543637
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:14:11 +02:00
Sandro La Bruzzo
4fe7b75644
renamed packages
2021-06-18 16:41:24 +02:00
Sandro La Bruzzo
3990165d05
changed typologies of unresolved relation
2021-06-18 11:43:59 +02:00
Miriam Baglioni
180d671127
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-18 09:46:18 +02:00
Miriam Baglioni
13c96622c9
-
2021-06-18 09:45:16 +02:00
Miriam Baglioni
b486ae498f
added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump
2021-06-18 09:43:32 +02:00
Miriam Baglioni
464c2ddde3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:42:31 +02:00
Miriam Baglioni
6aca0d8ebb
added kryo encoding for input files
2021-06-18 09:42:07 +02:00
Miriam Baglioni
3585e53da3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:41:23 +02:00
Claudio Atzori
41b551562e
applying PR#115 (DatePicker) on stable_ids
2021-06-17 09:33:50 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Claudio Atzori
74833d04f1
Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids
2021-06-16 15:54:18 +02:00
Claudio Atzori
7243a40c88
code formatting
2021-06-16 15:03:03 +02:00
Sandro La Bruzzo
dfcf78cf24
removed wrong code
2021-06-16 14:57:42 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Miriam Baglioni
95885bcf12
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
2021-06-16 10:17:52 +02:00
Miriam Baglioni
2550a73981
-
2021-06-16 10:04:41 +02:00
Miriam Baglioni
1c47c0d786
modified the number of executors trying to avoid OOM exception
2021-06-15 21:05:39 +02:00
Miriam Baglioni
7deac55138
added one option for resume from in the wf
2021-06-15 18:38:20 +02:00
Antonis Lempesis
f7c0b80e35
storing result_instance as parquet
2021-06-15 14:45:48 +03:00
Miriam Baglioni
66e7ef892f
changed the parameter name
2021-06-15 11:08:54 +02:00
Miriam Baglioni
4f47ad0891
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
2021-06-15 09:28:31 +02:00
Miriam Baglioni
9f9dd00b94
refactoring
2021-06-15 09:24:46 +02:00
Miriam Baglioni
63d74ee379
refactoring
2021-06-15 09:24:11 +02:00
Miriam Baglioni
6ebc236657
added needed property: outputPath
2021-06-15 09:23:24 +02:00
Miriam Baglioni
f7379255b6
changed the workflow to extract info from the dump
2021-06-15 09:22:54 +02:00
Miriam Baglioni
d6e21bb6ea
creates the crossref dataset used for doiboost together with unpacking part from tar
2021-06-14 17:27:19 +02:00
Miriam Baglioni
4da141bd7c
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 13:41:02 +02:00
Miriam Baglioni
ce0cfd79e0
creates the crossref dataset used for doiboost
2021-06-14 13:40:19 +02:00
Miriam Baglioni
93efe4de82
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
2021-06-14 13:39:40 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Miriam Baglioni
cf360d7c97
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 10:19:49 +02:00
Miriam Baglioni
8873e6b6d1
workflow and parameter
2021-06-14 10:15:57 +02:00
Miriam Baglioni
0f1acdf6b6
workflow and parameter
2021-06-14 10:08:55 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Sandro La Bruzzo
efbea1e01a
minor fix
2021-06-14 09:45:14 +02:00
Miriam Baglioni
75780fc636
extraction of the tar for the dump of crossref, and creation of the dataset
2021-06-14 09:45:07 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#112
2021-06-14 09:23:55 +02:00
Claudio Atzori
e9e86a237d
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-11 17:00:02 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
dd997c49e0
fix wrong relation id
...
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis
d413b24611
added instances, orgs for monitor, totalcost for projects, apcs
2021-06-10 02:35:46 +03:00
Claudio Atzori
741077dbca
Merge pull request 'Fix in Affiliation Propagation' ( #113 ) from miriam.baglioni/dnet-hadoop:master into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#113
2021-06-09 18:42:42 +02:00
Miriam Baglioni
32b0c27217
Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
...
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo
0d1f37302f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-09 09:35:16 +02:00
Miriam Baglioni
dc07f1079b
added check in case the author set to be enriched is null
2021-06-08 12:06:10 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo
0cdb7ccdaa
added inverse relations to datacite mapping
2021-06-04 15:10:20 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
96238152cb
added serialization for alternateIdentifiers and pids within each record instance
2021-05-28 16:57:30 +02:00