1
0
Fork 0
Commit Graph

1986 Commits

Author SHA1 Message Date
Enrico Ottonello 91d8660982 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-03-25 11:21:20 +01:00
Enrico Ottonello ebd67b8c8f removed duplicates orcid data on authors set 2021-03-25 11:20:52 +01:00
Claudio Atzori e8789b0cdb Merge pull request 'stats DB for monitor' (#99) from antonis.lempesis/dnet-hadoop:master into master
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph

```
count		r.reltype	r.subreltype	r.relclass
62.129.254	resultResult	citation	cites
62.043.309	resultResult	citation	isCitedBy
```

Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Antonis Lempesis 27796343ca crude sleep. hardcoded value 2021-03-03 01:37:47 +02:00
Enrico Ottonello 70cb100647 added updating last orcid dataset folders after completion 2021-03-01 10:17:04 +01:00
Enrico Ottonello bd3b16402b added result typologies 2021-03-01 10:16:02 +01:00
Enrico Ottonello ca1800510a Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-02-25 18:45:02 +01:00
Enrico Ottonello 53d7023460 dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters 2021-02-25 18:43:29 +01:00
Enrico Ottonello d43ea88caf aligned orcid result typologies with openaire vocabulary 2021-02-25 15:02:10 +01:00
Enrico Ottonello 975823b968 data from last updated orcid 2021-02-23 15:35:04 +01:00
Antonis Lempesis d90767c733 correctly invalidating metadata 2021-02-19 03:18:47 +02:00
Antonis Lempesis 3681afbe04 typo 2021-02-19 03:04:27 +02:00
Antonis Lempesis c5502eba8f actually moved stats computation in impala instead of hive... 2021-02-19 02:54:39 +02:00
Antonis Lempesis 33c85d4e66 moved stats computation in impala instead of hive 2021-02-18 17:23:34 +02:00
Antonis Lempesis b8e96c8ae7 moved cache update to the end 2021-02-18 16:42:22 +02:00
Antonis Lempesis bcbfc052b1 fixed last errors in step 21 2021-02-18 16:32:54 +02:00
Antonis Lempesis 10a29a4b9a fixes in monitor step 2021-02-18 15:05:59 +02:00
Antonis Lempesis 8ef66452d5 fixed typo 2021-02-17 22:24:44 +02:00
Antonis Lempesis a8836e2f5f fixed typo 2021-02-17 19:27:07 +02:00
Antonis Lempesis a445c1ac3d fixed variable names in monitor script 2021-02-17 16:45:09 +02:00
Antonis Lempesis 00d516360f added missing ; 2021-02-17 16:41:10 +02:00
Antonis Lempesis cd1b794409 added the monitor db wf 2021-02-17 02:11:55 +02:00
Alessia Bardi 32e81c2d89 non validated rel has null value in validated field 2021-02-16 11:01:42 +01:00
Antonis Lempesis 1c029b9fc0 fixed formatting 2021-02-14 03:14:24 +02:00
Antonis Lempesis 2c4dcc90ba analyzing tables to produce stats 2021-02-14 02:54:55 +02:00
Michele Artini 83d815d0bc only stats 2021-02-11 10:57:23 +01:00
Michele Artini 8c836bf930 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2021-02-11 10:54:41 +01:00
Michele Artini 8c1600398a added resumeFrom parameter 2021-02-11 10:54:16 +01:00
Claudio Atzori 3f8f78cbfb Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-02-11 09:36:10 +01:00
Claudio Atzori b34b5a39ca index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue 2021-02-11 09:36:04 +01:00
Michele Artini 7249cceb53 switch of 2 nodes 2021-02-11 09:27:08 +01:00
Alessia Bardi 986dd969d3 use the proper import for Lists 2021-02-10 12:03:54 +01:00
Alessia Bardi c4d1feca74 mapper test with validated link to project 2021-02-10 11:22:54 +01:00
Alessia Bardi 09fc7e2f78 serialization of validated flag on relationships 2021-02-10 11:22:09 +01:00
Enrico Ottonello ee4ba7298b fix last update read/write from file on hdfs 2021-02-09 23:24:57 +01:00
Claudio Atzori bc458d1b54 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-02-09 16:27:30 +01:00
Claudio Atzori 82e6c50f3f updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename) 2021-02-09 16:27:04 +01:00
Claudio Atzori 62bd3c53ee Merge branch 'master' into provision_indexing 2021-02-09 15:46:26 +01:00
Enrico Ottonello c238561001 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-02-04 10:44:21 +01:00
Enrico Ottonello 465ce39f75 job execution now based on file last_update.txt on hdfs 2021-02-04 10:44:04 +01:00
Alessia Bardi c67329d3ad updated test for EU Open Data portal datasets 2021-02-03 17:06:48 +01:00
Alessia Bardi fd705404a1 tests for EU Open Data portal dataset mapping 2021-02-03 10:28:17 +01:00
Claudio Atzori f1a852f278 align usage-stats workflow poms with latest snapshot version 2021-01-26 15:42:42 +01:00
Claudio Atzori 9c32119dc2 Merge pull request 'usage-stats-export-wf-v2' (#89) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
Thank you Dimitris!
2021-01-26 15:01:41 +01:00
Claudio Atzori 885e0dd926 [Cleaning] filter authors not providing word characters in the fullname 2021-01-26 09:48:53 +01:00
Claudio Atzori 2890511613 [Cleaning] normalise missing Result.country 2021-01-26 09:41:44 +01:00
Claudio Atzori 4eb9ed35b1 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-01-25 18:12:24 +01:00
Claudio Atzori cd379eb5e3 [Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname 2021-01-25 18:11:49 +01:00
Alessia Bardi 505477f36f format code 2021-01-25 18:02:49 +01:00
Alessia Bardi ded6ed8d7d no ',' author, if there are no author in ODF records 2021-01-25 17:57:51 +01:00