Browse Source

added michele notebook

master
Andrea Mannocci 6 months ago
parent
commit
3854e03d10
  1. 104689
      notebooks/01-AM-Exploration.ipynb
  2. 0
      notebooks/01.1-MB-Exploration_WorksSource.ipynb
  3. 2327
      notebooks/02-MM-Cluster_validation.ipynb
  4. 751
      notebooks/03-Supervised.ipynb
  5. 1
      src/data/make_dataset.py

104689
notebooks/01-Exploration.ipynb → notebooks/01-AM-Exploration.ipynb

File diff suppressed because one or more lines are too long

0
notebooks/01.1-Exploration_WorksSource.ipynb → notebooks/01.1-MB-Exploration_WorksSource.ipynb

2327
notebooks/02-MM-Cluster_validation.ipynb

File diff suppressed because it is too large

751
notebooks/03-Supervised.ipynb

File diff suppressed because it is too large

1
src/data/make_dataset.py

@ -125,6 +125,7 @@ def main(input_filepath, output_filepath, external_filepath):
df['primary_email_domain'] = df[df.primary_email.notna()]['primary_email'].apply(lambda x: x.split('@')[1])
df['other_email_domains'] = df[df.other_emails.notna()]['other_emails'].apply(lambda x: extract_email_domains(x))
df['url_domains'] = df[df.urls.notna()]['urls'].apply(lambda x: extract_url_domains(x))
df['other_url_domains'] = df[df.other_urls.notna()]['other_urls'].apply(lambda x: extract_url_domains(x))
logger.info('Creating simple numeric columns')
df['n_emails'] = df.other_emails.str.len()

Loading…
Cancel
Save