Andrea Mannocci
|
c6e079d90b
|
renaming notebooks
|
2022-03-17 10:33:33 +01:00 |
Andrea Mannocci
|
9b7c2efc5d
|
Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/data_registries_analysis
|
2022-02-17 13:27:56 +01:00 |
Andrea Mannocci
|
f84f5ee093
|
rerun dedup partitions analysis
|
2022-02-17 13:27:47 +01:00 |
miconis
|
efdbc76181
|
minor changes to fix relative paths
|
2022-02-17 10:23:06 +01:00 |
Andrea Mannocci
|
3353c08405
|
restructuring the project
|
2022-02-16 15:29:05 +01:00 |
Andrea Mannocci
|
abfb626faa
|
datasets updated. new dedup. new partitions
|
2022-02-16 15:27:26 +01:00 |
miconis
|
594ba0e1c7
|
script to create the csv file basing on the mergerels, generation of the mergerels and the deduplication csv
|
2022-02-16 13:04:59 +01:00 |
miconis
|
b31e97f71e
|
script to process ds dumps and create json full dump, ds dedup configuration added
|
2022-02-16 11:30:34 +01:00 |
Andrea Mannocci
|
e537f30a32
|
rerunning notebooks
|
2022-02-14 13:34:42 +01:00 |
Miriam Baglioni
|
6c71bde5f8
|
OpenDaor, re3data and roar data updated
|
2022-02-14 13:23:50 +01:00 |
Andrea Mannocci
|
2df1b0ca2e
|
new dump from fairsharing
|
2022-02-14 12:26:36 +01:00 |
Andrea Mannocci
|
722a9aa0cf
|
rewiring subject and geo analysis
|
2021-10-08 14:28:56 +02:00 |
Andrea Mannocci
|
dff032e2b3
|
regenerating datasets with proper column names
|
2021-10-08 13:55:20 +02:00 |
Andrea Mannocci
|
a8b52c6931
|
regenerating datasets with proper column names
|
2021-10-08 13:42:09 +02:00 |
Andrea Mannocci
|
dca072f654
|
rewiring notebook for single registry inspection
|
2021-10-08 12:46:14 +02:00 |
Andrea Mannocci
|
98075dbae9
|
counting duplicates within
|
2021-10-08 11:25:50 +02:00 |
Andrea Mannocci
|
cc2c004b9e
|
counting duplicates within
|
2021-10-08 10:51:04 +02:00 |
Andrea Mannocci
|
a55db56e2e
|
counting duplicates within
|
2021-10-08 10:50:42 +02:00 |
Miriam Baglioni
|
8f3175f792
|
last updates
|
2021-10-07 11:19:40 +02:00 |
Miriam Baglioni
|
021c9b4db3
|
Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/registries_analysis
|
2021-10-07 10:01:59 +02:00 |
Miriam Baglioni
|
f8733ffc5f
|
new OpenDOAR with missing field (sorry Andre)
|
2021-10-07 10:01:45 +02:00 |
Andrea Mannocci
|
c072a0a90f
|
recreating dataframes
|
2021-10-07 09:39:37 +02:00 |
Andrea Mannocci
|
9dfedb2a7b
|
recreating dataframes
|
2021-10-07 09:37:21 +02:00 |
Andrea Mannocci
|
da8f0818df
|
Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/data_registries_analysis
|
2021-10-07 09:36:12 +02:00 |
Miriam Baglioni
|
490b69833a
|
new OpenDOAR with missing field
|
2021-10-07 09:32:45 +02:00 |
Andrea Mannocci
|
02e7ed79a2
|
recreating dataframes
|
2021-10-07 08:59:07 +02:00 |
Miriam Baglioni
|
c0892c676c
|
new mapping with dictionary to list the field name of wrapper elements also for OpenDOAR
|
2021-10-06 22:51:08 +02:00 |
Miriam Baglioni
|
cae8426ef7
|
new mapping with dictionary to list the field name of wrapper elements
|
2021-10-06 18:04:01 +02:00 |
Miriam Baglioni
|
0fefbfd2c8
|
new mapping
|
2021-10-06 16:36:16 +02:00 |
Andrea Mannocci
|
74bb9edd04
|
added simple checks for across registrations
|
2021-10-06 15:22:23 +02:00 |
Andrea Mannocci
|
264f527fcb
|
partitioned dup groups
|
2021-10-04 13:46:28 +02:00 |
Andrea Mannocci
|
13f18b5d33
|
partitioned dup groups
|
2021-10-01 17:26:00 +02:00 |
miconis
|
84b32e5d33
|
addition of ds dedup with levenshtein distance and 0.9 threshold
|
2021-10-01 13:03:33 +02:00 |
Andrea Mannocci
|
76973593b6
|
removed old datasets
|
2021-10-01 12:05:35 +02:00 |
Andrea Mannocci
|
b62be6dde8
|
downloaded FS again with metadata block
|
2021-10-01 12:00:03 +02:00 |
Andrea Mannocci
|
c5411b0af0
|
downloaded FS again
|
2021-10-01 09:09:49 +02:00 |
Miriam Baglioni
|
10d08e4251
|
-
|
2021-09-30 12:46:55 +02:00 |
Miriam Baglioni
|
1c21796be8
|
-
|
2021-09-30 12:31:04 +02:00 |
Miriam Baglioni
|
56f871dc4d
|
let's try again....
|
2021-09-30 12:28:01 +02:00 |
Andrea Mannocci
|
7e3d933641
|
fixed re3data
|
2021-09-30 12:23:12 +02:00 |
Miriam Baglioni
|
ca67c1c928
|
new version of .tsv
|
2021-09-30 12:16:35 +02:00 |
Andrea Mannocci
|
b7226dfcc7
|
renaming fairsharing
|
2021-09-30 11:57:01 +02:00 |
Miriam Baglioni
|
b94e693c57
|
Merge branch 'master' of https://code-repo.d4science.org/andrea.mannocci/registries_analysis
|
2021-09-30 11:55:43 +02:00 |
Miriam Baglioni
|
c4c13fd6f2
|
Using values instad of None for empty fields
|
2021-09-30 11:54:56 +02:00 |
Andrea Mannocci
|
778882aa64
|
renaming re3data
|
2021-09-30 11:32:54 +02:00 |
Andrea Mannocci
|
93986fcec6
|
adding old opendoar dataset which was working fine
|
2021-09-30 11:31:24 +02:00 |
Andrea Mannocci
|
faa8fd69cf
|
adding raw fairsharing extended
|
2021-09-29 11:18:12 +02:00 |
Andrea Mannocci
|
fffd8be0e0
|
adding raw fairsharing
|
2021-09-28 14:57:12 +02:00 |
Andrea Mannocci
|
52f10ca94b
|
Merge branch 'master' of https://code-repo.d3science.org/andrea.mannocci/data_registries_analysis
|
2021-09-28 14:56:19 +02:00 |
Andrea Mannocci
|
ecf5bd9ad7
|
addidng raw fairsharing
|
2021-09-28 14:56:12 +02:00 |