Commit Graph

52 Commits

Author SHA1 Message Date
Andrea Mannocci 722a9aa0cf rewiring subject and geo analysis 2021-10-08 14:28:56 +02:00
Andrea Mannocci dff032e2b3 regenerating datasets with proper column names 2021-10-08 13:55:20 +02:00
Andrea Mannocci a8b52c6931 regenerating datasets with proper column names 2021-10-08 13:42:09 +02:00
Andrea Mannocci dca072f654 rewiring notebook for single registry inspection 2021-10-08 12:46:14 +02:00
Andrea Mannocci 98075dbae9 counting duplicates within 2021-10-08 11:25:50 +02:00
Andrea Mannocci cc2c004b9e counting duplicates within 2021-10-08 10:51:04 +02:00
Andrea Mannocci a55db56e2e counting duplicates within 2021-10-08 10:50:42 +02:00
Miriam Baglioni 8f3175f792 last updates 2021-10-07 11:19:40 +02:00
Miriam Baglioni 021c9b4db3 Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/registries_analysis 2021-10-07 10:01:59 +02:00
Miriam Baglioni f8733ffc5f new OpenDOAR with missing field (sorry Andre) 2021-10-07 10:01:45 +02:00
Andrea Mannocci c072a0a90f recreating dataframes 2021-10-07 09:39:37 +02:00
Andrea Mannocci 9dfedb2a7b recreating dataframes 2021-10-07 09:37:21 +02:00
Andrea Mannocci da8f0818df Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/data_registries_analysis 2021-10-07 09:36:12 +02:00
Miriam Baglioni 490b69833a new OpenDOAR with missing field 2021-10-07 09:32:45 +02:00
Andrea Mannocci 02e7ed79a2 recreating dataframes 2021-10-07 08:59:07 +02:00
Miriam Baglioni c0892c676c new mapping with dictionary to list the field name of wrapper elements also for OpenDOAR 2021-10-06 22:51:08 +02:00
Miriam Baglioni cae8426ef7 new mapping with dictionary to list the field name of wrapper elements 2021-10-06 18:04:01 +02:00
Miriam Baglioni 0fefbfd2c8 new mapping 2021-10-06 16:36:16 +02:00
Andrea Mannocci 74bb9edd04 added simple checks for across registrations 2021-10-06 15:22:23 +02:00
Andrea Mannocci 264f527fcb partitioned dup groups 2021-10-04 13:46:28 +02:00
Andrea Mannocci 13f18b5d33 partitioned dup groups 2021-10-01 17:26:00 +02:00
miconis 84b32e5d33 addition of ds dedup with levenshtein distance and 0.9 threshold 2021-10-01 13:03:33 +02:00
Andrea Mannocci 76973593b6 removed old datasets 2021-10-01 12:05:35 +02:00
Andrea Mannocci b62be6dde8 downloaded FS again with metadata block 2021-10-01 12:00:03 +02:00
Andrea Mannocci c5411b0af0 downloaded FS again 2021-10-01 09:09:49 +02:00
Miriam Baglioni 10d08e4251 - 2021-09-30 12:46:55 +02:00
Miriam Baglioni 1c21796be8 - 2021-09-30 12:31:04 +02:00
Miriam Baglioni 56f871dc4d let's try again.... 2021-09-30 12:28:01 +02:00
Andrea Mannocci 7e3d933641 fixed re3data 2021-09-30 12:23:12 +02:00
Miriam Baglioni ca67c1c928 new version of .tsv 2021-09-30 12:16:35 +02:00
Andrea Mannocci b7226dfcc7 renaming fairsharing 2021-09-30 11:57:01 +02:00
Miriam Baglioni b94e693c57 Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/registries_analysis 2021-09-30 11:55:43 +02:00
Miriam Baglioni c4c13fd6f2 Using values instad of None for empty fields 2021-09-30 11:54:56 +02:00
Andrea Mannocci 778882aa64 renaming re3data 2021-09-30 11:32:54 +02:00
Andrea Mannocci 93986fcec6 adding old opendoar dataset which was working fine 2021-09-30 11:31:24 +02:00
Andrea Mannocci faa8fd69cf adding raw fairsharing extended 2021-09-29 11:18:12 +02:00
Andrea Mannocci fffd8be0e0 adding raw fairsharing 2021-09-28 14:57:12 +02:00
Andrea Mannocci 52f10ca94b Merge branch 'master' of https://code-repo.d3science.org/andrea.mannocci/data_registries_analysis 2021-09-28 14:56:19 +02:00
Andrea Mannocci ecf5bd9ad7 addidng raw fairsharing 2021-09-28 14:56:12 +02:00
Miriam Baglioni 8d376a54f2 fixed openDoar.tsv 2021-09-24 11:47:33 +02:00
Miriam Baglioni 96fbaa553e new mapping from the last import in OpenAIRE 2021-09-23 14:49:25 +02:00
Andrea Mannocci 6abcd9b142 new dedup file 2021-09-22 11:59:30 +02:00
Andrea Mannocci 7ab83cbb10 starting to analyse overlap 2021-07-26 11:15:14 +02:00
Andrea Mannocci dd6b79e69f each registry has a basic analysis 2021-07-23 15:28:23 +02:00
Miriam Baglioni 434fe5ed20 Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/registries_analysis 2021-07-23 14:18:21 +02:00
Andrea Mannocci c2943c4818 each registry has a basic analysis 2021-07-23 12:41:17 +02:00
Andrea Mannocci be70e8cc74 added first dedup results 2021-07-23 12:38:56 +02:00
Miriam Baglioni 76ad6143ea added the mapping with hopefully nan values when empty elements in the xml 2021-07-22 13:10:02 +02:00
Andrea Mannocci 63661dfb32 new notebooks 2021-07-22 11:35:40 +02:00
Andrea Mannocci c6d01322c3 added datasets 2021-07-22 11:03:05 +02:00