Miriam Baglioni
|
c26980f1c4
|
Adding spark.close() to avoid Only one SparkContext may be running in this JVM error while running test on Jenkins and fixed issue
|
2021-07-13 10:33:00 +02:00 |
Miriam Baglioni
|
1ea66e8917
|
some more tests for authormerger
|
2021-07-12 10:06:29 +02:00 |
Miriam Baglioni
|
b0d86d32b0
|
added list of author to be merged
|
2021-07-08 18:56:29 +02:00 |
Miriam Baglioni
|
abe546e5ba
|
added resource files for test author merger for empy crossref and other merging providers (related to DoiBoostAuthorMerger)
|
2021-07-08 18:55:55 +02:00 |
Miriam Baglioni
|
bf24f588e2
|
Added test for empty author list for crossref and other merging providers (related to DoiBoostAuthorMerger)
|
2021-07-08 18:55:13 +02:00 |
Miriam Baglioni
|
96255fa647
|
-
|
2021-07-08 18:54:27 +02:00 |
Miriam Baglioni
|
0e47e94099
|
Added variable to verify if crossref is base for the merging of authors (related to DoiBoostAuthorMerger)
|
2021-07-08 18:54:07 +02:00 |
Miriam Baglioni
|
434aa6380b
|
Adding description of the merging process for DoiBoost (related to DoiBoostAuthorMerger) - to be refined
|
2021-07-08 18:53:15 +02:00 |
Miriam Baglioni
|
e0e80cde22
|
Added class to store the most similar author list to be enriched w.r.t. one enriching author (related to DoiBoostAuthorMerger)
|
2021-07-08 18:52:25 +02:00 |
Miriam Baglioni
|
97e0c27db9
|
Added check for empty author list. If crossref is empty, the longest from all the merging providers is taken. If crossref is not empty, crossref is chosen as base for the enrichment
|
2021-07-08 15:27:05 +02:00 |
Miriam Baglioni
|
7498e63174
|
added resource files for testing of DoiBoostAuthorMerger
|
2021-07-05 16:26:46 +02:00 |
Miriam Baglioni
|
22ce947335
|
added resource files for testing of DoiBoostAuthorMerger
|
2021-07-05 16:26:17 +02:00 |
Miriam Baglioni
|
f64f5d9e23
|
first implementation and test class for the specific Author Merger for doiboost. First change: crossref as base to be enriched. Modified the normalization function to remove accents from words
|
2021-07-05 16:24:47 +02:00 |
Miriam Baglioni
|
238d692a0a
|
apply specific AuthorMerger for doiboost
|
2021-07-05 16:23:33 +02:00 |
Miriam Baglioni
|
7177c25261
|
added check for null value during doi normalization
|
2021-07-05 16:22:38 +02:00 |
Miriam Baglioni
|
0892cad4e8
|
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
|
2021-07-05 16:21:42 +02:00 |
Miriam Baglioni
|
bc34347643
|
added assertions to verify doi normalization
|
2021-06-30 14:37:08 +02:00 |
Miriam Baglioni
|
86f47afcc7
|
slight modification of the resource to accomodate also doi normalization tests
|
2021-06-30 14:36:49 +02:00 |
Miriam Baglioni
|
03767ea8e6
|
slight modification of the resource to accomodate also doi normalization tests
|
2021-06-30 13:21:24 +02:00 |
Miriam Baglioni
|
f8eec0ca9a
|
added resource to test the normalization of doi during the import of MAG
|
2021-06-30 13:19:54 +02:00 |
Miriam Baglioni
|
149f85ddf5
|
added tests for the normalization of the dois
|
2021-06-30 13:00:52 +02:00 |
Miriam Baglioni
|
e487b5544c
|
added tests for the normalization of the dois
|
2021-06-30 12:57:11 +02:00 |
Miriam Baglioni
|
1503ccbbb5
|
added tests for the normalization of the dois
|
2021-06-30 12:55:37 +02:00 |
Miriam Baglioni
|
1299bfb357
|
Added class to test the normalization of doi
|
2021-06-30 12:53:27 +02:00 |
Miriam Baglioni
|
cf758f4f91
|
added normalization step for the doi
|
2021-06-30 10:03:15 +02:00 |
Miriam Baglioni
|
801763a0fa
|
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
|
2021-06-29 19:07:23 +02:00 |
Miriam Baglioni
|
a74de1cda2
|
added normalization step to the doi
|
2021-06-29 18:51:11 +02:00 |
Miriam Baglioni
|
06074ea7d3
|
added normalization step to the doi
|
2021-06-29 18:46:08 +02:00 |
Miriam Baglioni
|
8b8ffe82dc
|
added step of normalization for the doi
|
2021-06-29 18:41:39 +02:00 |
Miriam Baglioni
|
50cc21d92e
|
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
|
2021-06-29 18:35:28 +02:00 |
Miriam Baglioni
|
13c96622c9
|
-
|
2021-06-18 09:45:16 +02:00 |
Miriam Baglioni
|
b486ae498f
|
added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump
|
2021-06-18 09:43:32 +02:00 |
Miriam Baglioni
|
464c2ddde3
|
changed to split in two steps the generation of the crossref dataset
|
2021-06-18 09:42:31 +02:00 |
Miriam Baglioni
|
6aca0d8ebb
|
added kryo encoding for input files
|
2021-06-18 09:42:07 +02:00 |
Miriam Baglioni
|
3585e53da3
|
changed to split in two steps the generation of the crossref dataset
|
2021-06-18 09:41:23 +02:00 |
Miriam Baglioni
|
95885bcf12
|
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
|
2021-06-16 10:17:52 +02:00 |
Miriam Baglioni
|
2550a73981
|
-
|
2021-06-16 10:04:41 +02:00 |
Miriam Baglioni
|
1c47c0d786
|
modified the number of executors trying to avoid OOM exception
|
2021-06-15 21:05:39 +02:00 |
Miriam Baglioni
|
7deac55138
|
added one option for resume from in the wf
|
2021-06-15 18:38:20 +02:00 |
Miriam Baglioni
|
66e7ef892f
|
changed the parameter name
|
2021-06-15 11:08:54 +02:00 |
Miriam Baglioni
|
4f47ad0891
|
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
|
2021-06-15 09:28:31 +02:00 |
Miriam Baglioni
|
9f9dd00b94
|
refactoring
|
2021-06-15 09:24:46 +02:00 |
Miriam Baglioni
|
63d74ee379
|
refactoring
|
2021-06-15 09:24:11 +02:00 |
Miriam Baglioni
|
6ebc236657
|
added needed property: outputPath
|
2021-06-15 09:23:24 +02:00 |
Miriam Baglioni
|
f7379255b6
|
changed the workflow to extract info from the dump
|
2021-06-15 09:22:54 +02:00 |
Miriam Baglioni
|
d6e21bb6ea
|
creates the crossref dataset used for doiboost together with unpacking part from tar
|
2021-06-14 17:27:19 +02:00 |
Miriam Baglioni
|
ce0cfd79e0
|
creates the crossref dataset used for doiboost
|
2021-06-14 13:40:19 +02:00 |
Miriam Baglioni
|
93efe4de82
|
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
|
2021-06-14 13:39:40 +02:00 |
Miriam Baglioni
|
8873e6b6d1
|
workflow and parameter
|
2021-06-14 10:15:57 +02:00 |
Miriam Baglioni
|
0f1acdf6b6
|
workflow and parameter
|
2021-06-14 10:08:55 +02:00 |