Myrto Kallipoliti
|
44f0f9987f
|
Merge pull request 'Oalex' (#13) from openaire-workflow-ready_2 into openaire-workflow-ready
Reviewed-on: #13
|
2024-12-09 18:51:22 +01:00 |
Miriam Baglioni
|
ad691c28c2
|
[oalex] change to add a thread to monitor the number of operations done by affro up to a certain point
|
2024-12-06 10:19:53 +01:00 |
Miriam Baglioni
|
2806511e02
|
[oalex] change collec_list to collect_set so that the same match will be there just one time
|
2024-12-05 21:26:08 +01:00 |
Miriam Baglioni
|
0043e4051f
|
[oalex] renaming
|
2024-12-05 18:44:06 +01:00 |
Miriam Baglioni
|
a59d0ce9fc
|
[oalex] avoid redefinition of explode function
|
2024-12-05 18:41:16 +01:00 |
Miriam Baglioni
|
e2f8007433
|
[oalex] added fix
|
2024-12-05 16:50:10 +01:00 |
Miriam Baglioni
|
f8479083f2
|
[oalex] pasing the schema to avoid changing in confidence type
|
2024-12-05 16:44:17 +01:00 |
Miriam Baglioni
|
9440f863c9
|
[oalex] changed implementation passing throguh rdd to avoi calling udf function
|
2024-12-05 16:36:38 +01:00 |
Miriam Baglioni
|
f78456288c
|
[oalex] fix issue
|
2024-12-05 12:54:10 +01:00 |
Miriam Baglioni
|
997f2e492f
|
[oalex] change the call of the function in the dataframe
|
2024-12-05 12:35:59 +01:00 |
Miriam Baglioni
|
982a1b0b9f
|
[oalex] change the call of the function in the dataframe
|
2024-12-05 12:21:21 +01:00 |
Miriam Baglioni
|
4fe3d31ed5
|
[oalex] register the UDF oalex_affro and the schema of the output to be used in the dataframe by pyspark
|
2024-12-05 12:18:45 +01:00 |
Miriam Baglioni
|
efa4db4e52
|
[oalex] execute affRo on distinct affilitaion_strings
|
2024-12-05 12:02:40 +01:00 |
Miriam Baglioni
|
ea2e27a9f4
|
[oalex] fix python syntax errors
|
2024-12-05 11:22:10 +01:00 |
Miriam Baglioni
|
e33bf4ef14
|
[oalex] proposal to higher the parallelization
|
2024-12-05 10:39:00 +01:00 |
Miriam Baglioni
|
f4704aef4d
|
[oalex] proposal to higher the parallelization
|
2024-12-05 10:27:32 +01:00 |
Miriam Baglioni
|
0500fc586f
|
Added input/output path as parameters
|
2024-12-04 15:14:58 +01:00 |
Miriam Baglioni
|
5568aa92ec
|
Remove from path
|
2024-12-03 16:54:47 +01:00 |
Miriam Baglioni
|
600ddf8087
|
Remove directory name
Change to make the file discoverable on the cluster
|
2024-12-03 16:45:57 +01:00 |
mkallipo
|
03dc19fd3b
|
add gitignore
|
2024-12-01 20:04:32 +01:00 |
mkallipo
|
d9dbc679e3
|
updates
|
2024-12-01 20:00:49 +01:00 |
mkallipo
|
413ec3773e
|
updates -datacite
|
2024-11-21 13:32:50 +01:00 |
mkallipo
|
ba98a16bcb
|
updates -openorgs
|
2024-11-21 12:39:26 +01:00 |
mkallipo
|
415b45e3ca
|
updates
|
2024-10-28 11:13:55 +01:00 |
mkallipo
|
8c6f6a5a9a
|
crosserf
|
2024-10-24 09:32:08 +02:00 |
mkallipo
|
b4f79adc56
|
path
|
2024-10-18 13:19:41 +02:00 |
mkallipo
|
90426a6d29
|
path
|
2024-10-18 13:12:00 +02:00 |
mkallipo
|
ad656121ed
|
arguments
|
2024-10-18 10:48:18 +02:00 |
mkallipo
|
ca6e8ad3b9
|
.
|
2024-10-16 13:29:39 +02:00 |
mkallipo
|
8325c94e56
|
strings.py
|
2024-10-16 12:42:51 +02:00 |
mkallipo
|
5795ec6493
|
general, afiliated stopwords
|
2024-10-07 11:45:41 +02:00 |
mkallipo
|
57569fbb3b
|
dix_acad, zu stopword
|
2024-10-07 11:39:21 +02:00 |
mkallipo
|
968ecf9680
|
multi
|
2024-10-07 11:35:15 +02:00 |
mkallipo
|
2c6e7b7a70
|
multi
|
2024-10-07 11:25:16 +02:00 |
mkallipo
|
9473c30a09
|
dictionaries
|
2024-10-06 22:09:42 +02:00 |
mkallipo
|
bace694d21
|
updates
|
2024-09-19 21:37:28 +02:00 |
mkallipo
|
a7b703b67d
|
updates german terms, /
|
2024-09-17 12:06:29 +02:00 |
mkallipo
|
b38be012a0
|
updates abbr
|
2024-09-16 12:20:37 +02:00 |
mkallipo
|
fbf55b3d5d
|
redirection of non active ror ids
|
2024-09-12 15:56:26 +02:00 |
mkallipo
|
0c98ba76a6
|
initial commit
|
2024-09-05 12:23:32 +02:00 |
Myrto Kallipoliti
|
530e474d7c
|
Upload files to "dictionaries"
|
2024-09-05 12:17:09 +02:00 |