|
|
||
|---|---|---|
| affro | ||
| affro.egg-info | ||
| build/lib/affro | ||
| .gitignore | ||
| MANIFEST.in | ||
| README.md | ||
| pyproject.toml | ||
| requirements.txt | ||
| setup.py | ||
| test_gitea.ipynb | ||
README.md
Affiliation-Matching Repository [aka AffRo]
This repository contains code for matching affiliation strings to ROR or/and OpenOrg IDs.
🚀 As it is still a work in progress, the repository may not always be up-to-date. However, I will incorporate improvements and bug fixes regularly.
Main files
helpers/create_input.py,helpers/matching.py, contain the main algorithm (preprocessing and matching phase, respectively).
Testing
- test.ipynb
Description of the algorithm
Goal: Identify organizations inside a raw affiliation string and match the corresponding ROR ids.
Steps:
- Preprocessing phase:
- Cleaning and stemming
- Keyword labeling and partitioning
- Partition pruning and string shortening
- Matching phase:
- Candidate identification and refinement
- Results disambiguation
Basic parameters:
- Threshold for university related organizations (default 0.42).
- Threshold for other organizations (default 0.82).
- Context window for trimming text around the term "university" (default 3)
Contact
If you have any questions, comments, or issues, please feel free to contact me. You can reach me via email at [myrto.kallipoliti@gmail.com]. Feedback and contributions are also welcome.