Go to file
mkallipo b274291443 bugs, new ids 2025-11-19 11:23:53 +01:00
affro bugs, new ids 2025-11-19 11:23:53 +01:00
affro.egg-info Fixed bugs, added new OpenOrgs IDs, updated ROR IDs, updated file structure, added is_first label to org names, improved organization's category handling 2025-11-18 13:16:17 +01:00
build/lib/affro bugs, new ids 2025-11-19 11:23:53 +01:00
.gitignore add gitignore 2024-12-01 20:04:32 +01:00
MANIFEST.in updates 2025-06-23 17:16:03 +02:00
README.md updates 2025-06-23 17:16:03 +02:00
pyproject.toml updates 2025-06-23 17:16:03 +02:00
requirements.txt new structure for the dictionaries, new openorgs ids, ror version oct 2025-10-17 17:25:01 +02:00
setup.py add name and version 2025-08-06 20:45:37 +03:00
test_gitea.ipynb bugs, new ids 2025-11-19 11:23:53 +01:00

README.md

Affiliation-Matching Repository [aka AffRo]

This repository contains code for matching affiliation strings to ROR or/and OpenOrg IDs.

🚀 As it is still a work in progress, the repository may not always be up-to-date. However, I will incorporate improvements and bug fixes regularly.

Main files

  • helpers/create_input.py, helpers/matching.py, contain the main algorithm (preprocessing and matching phase, respectively).

Testing

  • test.ipynb

Description of the algorithm

Goal: Identify organizations inside a raw affiliation string and match the corresponding ROR ids.

Steps:

  • Preprocessing phase:
  1. Cleaning and stemming
  2. Keyword labeling and partitioning
  3. Partition pruning and string shortening
  • Matching phase:
  1. Candidate identification and refinement
  2. Results disambiguation

Basic parameters:

  1. Threshold for university related organizations (default 0.42).
  2. Threshold for other organizations (default 0.82).
  3. Context window for trimming text around the term "university" (default 3)

Contact

If you have any questions, comments, or issues, please feel free to contact me. You can reach me via email at [myrto.kallipoliti@gmail.com]. Feedback and contributions are also welcome.