Commit Graph

40 Commits

Author SHA1 Message Date
Miriam Baglioni ad691c28c2 [oalex] change to add a thread to monitor the number of operations done by affro up to a certain point 2024-12-06 10:19:53 +01:00
Miriam Baglioni 2806511e02 [oalex] change collec_list to collect_set so that the same match will be there just one time 2024-12-05 21:26:08 +01:00
Miriam Baglioni 0043e4051f [oalex] renaming 2024-12-05 18:44:06 +01:00
Miriam Baglioni a59d0ce9fc [oalex] avoid redefinition of explode function 2024-12-05 18:41:16 +01:00
Miriam Baglioni e2f8007433 [oalex] added fix 2024-12-05 16:50:10 +01:00
Miriam Baglioni f8479083f2 [oalex] pasing the schema to avoid changing in confidence type 2024-12-05 16:44:17 +01:00
Miriam Baglioni 9440f863c9 [oalex] changed implementation passing throguh rdd to avoi calling udf function 2024-12-05 16:36:38 +01:00
Miriam Baglioni f78456288c [oalex] fix issue 2024-12-05 12:54:10 +01:00
Miriam Baglioni 997f2e492f [oalex] change the call of the function in the dataframe 2024-12-05 12:35:59 +01:00
Miriam Baglioni 982a1b0b9f [oalex] change the call of the function in the dataframe 2024-12-05 12:21:21 +01:00
Miriam Baglioni 4fe3d31ed5 [oalex] register the UDF oalex_affro and the schema of the output to be used in the dataframe by pyspark 2024-12-05 12:18:45 +01:00
Miriam Baglioni efa4db4e52 [oalex] execute affRo on distinct affilitaion_strings 2024-12-05 12:02:40 +01:00
Miriam Baglioni ea2e27a9f4 [oalex] fix python syntax errors 2024-12-05 11:22:10 +01:00
Miriam Baglioni e33bf4ef14 [oalex] proposal to higher the parallelization 2024-12-05 10:39:00 +01:00
Miriam Baglioni f4704aef4d [oalex] proposal to higher the parallelization 2024-12-05 10:27:32 +01:00
Miriam Baglioni 0500fc586f Added input/output path as parameters 2024-12-04 15:14:58 +01:00
Miriam Baglioni 5568aa92ec Remove from path 2024-12-03 16:54:47 +01:00
Miriam Baglioni 600ddf8087 Remove directory name
Change to make the file discoverable on the cluster
2024-12-03 16:45:57 +01:00
mkallipo 03dc19fd3b add gitignore 2024-12-01 20:04:32 +01:00
mkallipo d9dbc679e3 updates 2024-12-01 20:00:49 +01:00
mkallipo 413ec3773e updates -datacite 2024-11-21 13:32:50 +01:00
mkallipo ba98a16bcb updates -openorgs 2024-11-21 12:39:26 +01:00
mkallipo 415b45e3ca updates 2024-10-28 11:13:55 +01:00
mkallipo 8c6f6a5a9a crosserf 2024-10-24 09:32:08 +02:00
mkallipo b4f79adc56 path 2024-10-18 13:19:41 +02:00
mkallipo 90426a6d29 path 2024-10-18 13:12:00 +02:00
mkallipo ad656121ed arguments 2024-10-18 10:48:18 +02:00
mkallipo ca6e8ad3b9 . 2024-10-16 13:29:39 +02:00
mkallipo 8325c94e56 strings.py 2024-10-16 12:42:51 +02:00
mkallipo 5795ec6493 general, afiliated stopwords 2024-10-07 11:45:41 +02:00
mkallipo 57569fbb3b dix_acad, zu stopword 2024-10-07 11:39:21 +02:00
mkallipo 968ecf9680 multi 2024-10-07 11:35:15 +02:00
mkallipo 2c6e7b7a70 multi 2024-10-07 11:25:16 +02:00
mkallipo 9473c30a09 dictionaries 2024-10-06 22:09:42 +02:00
mkallipo bace694d21 updates 2024-09-19 21:37:28 +02:00
mkallipo a7b703b67d updates german terms, / 2024-09-17 12:06:29 +02:00
mkallipo b38be012a0 updates abbr 2024-09-16 12:20:37 +02:00
mkallipo fbf55b3d5d redirection of non active ror ids 2024-09-12 15:56:26 +02:00
mkallipo 0c98ba76a6 initial commit 2024-09-05 12:23:32 +02:00
Myrto Kallipoliti 530e474d7c Upload files to "dictionaries" 2024-09-05 12:17:09 +02:00