Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Miriam Baglioni
76d41602be
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-18 10:53:22 +02:00
Miriam Baglioni
46f82c7c8f
removed not needed folder deletion
2021-10-18 10:53:16 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Antonis Lempesis
41ecb1eb61
invalidating medatadata before context thingies
2021-10-15 13:42:55 +03:00
Antonis Lempesis
4b7c8dff2d
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-14 18:53:35 +03:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Claudio Atzori
14fbf92ad6
Merge branch 'beta' into beta_solr_config
2021-10-14 11:08:44 +02:00
Claudio Atzori
b292e4a700
[stats wf] added extra logging in the context data retrieval phase
2021-10-13 17:31:53 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
Miriam Baglioni
5d9cc2452d
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-13 15:33:50 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
16b28494a9
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-13 11:34:24 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Serafeim Chatzopoulos
201ce71cc1
Add resultsubject, relprojectname and resultacceptanceyear to __all field
2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos
e468a7b96b
Add tests to query Solr with different configurations
2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos
de81007302
Add exploreTestConfig, a new Solr configuration folder
2021-10-08 16:54:56 +03:00
Sandro La Bruzzo
8f99d2af86
Make the node of doiBoost to point to the correct OpenAire Organization in relations
2021-10-08 08:35:12 +02:00
Alessia Bardi
c48c43fa9e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-07 17:30:53 +02:00
Alessia Bardi
8d3b60f446
test for patching records for EOSC Future
2021-10-07 17:30:45 +02:00
miconis
611ca511db
set configuration property in openorgs duplicates wf
2021-10-07 15:39:55 +02:00
miconis
9646b9fd98
implementation of the http call for the update of openorgs suggestions
2021-10-07 11:29:11 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Sandro La Bruzzo
f258bbb927
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-10-05 10:21:50 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Claudio Atzori
cb7efe12ac
Merge pull request 'beta' ( #146 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#146
2021-10-05 10:09:37 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
dimitrispie
3f25d2efb2
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2021-10-01 16:03:48 +03:00
dimitrispie
13687fd887
Sprint 3 indicators update
2021-10-01 16:02:02 +03:00
Miriam Baglioni
9814c3e700
mergin with branch beta
2021-10-01 13:00:03 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
60a6a9a583
[graph2hive] added field 'measures' to the result view
2021-09-30 09:27:26 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Sandro La Bruzzo
477cb10715
Merge remote-tracking branch 'origin/beta' into beta
2021-09-27 16:57:23 +02:00
Sandro La Bruzzo
be79d74e3d
Fixed DoiBoost generation to point to correct organization in affiliation relation
2021-09-27 16:57:04 +02:00
Claudio Atzori
474117c2e8
Merge branch 'beta' into dedup_whitelist
2021-09-27 16:41:25 +02:00
Miriam Baglioni
476a4708d6
mergin with branch beta
2021-09-27 16:02:32 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Claudio Atzori
a53acfbc06
Merge pull request '[stats] updates in the mapping, indicators, wf' ( #145 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#145
2021-09-27 15:59:54 +02:00
Alessia Bardi
b924276e18
tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject
2021-09-24 17:11:56 +02:00
Antonis Lempesis
a1e1cf32d7
fixed an impala error
2021-09-24 12:57:24 +03:00
Antonis Lempesis
f358cabb2b
fixed typo
2021-09-22 21:50:37 +03:00
Miriam Baglioni
eedf7c3310
mergin with branch beta
2021-09-22 15:18:34 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
7fa60e166e
Merge branch 'beta' into dedup_whitelist
2021-09-22 11:31:18 +02:00
Antonis Lempesis
421d55265d
created hive action for observatory queries
2021-09-21 03:07:58 +03:00
Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
0c74f5667e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-09-20 18:12:31 +02:00
miconis
853333bdde
implementation of the whitelist for similarity relations
2021-09-20 16:21:47 +02:00
Antonis Lempesis
8b681dcf1b
attempt to make the observatory wf run in hive
2021-09-18 00:35:14 +03:00
Antonis Lempesis
2943287d10
fixed the definition of cc_licence, part II
2021-09-16 15:59:06 +03:00
Antonis Lempesis
dd2329849f
fixed the definition of cc_licence
2021-09-16 13:50:34 +03:00
Claudio Atzori
09c2eb7f62
Merge branch 'beta' into clean_relations
2021-09-16 11:09:47 +02:00
Miriam Baglioni
e9ccdf853f
related to D-Net/dnet-hadoop#132
2021-09-15 18:44:54 +02:00
Claudio Atzori
12766bf5f2
Merge branch 'beta' into clean_relations
2021-09-15 17:18:15 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 D-Net/dnet-hadoop#140
2021-09-15 16:40:25 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Antonis Lempesis
de9bf3a161
added cc_licences and abstracts in observatory db
2021-09-14 01:29:08 +03:00
Antonis Lempesis
9b1936701c
fixed yet another typo
2021-09-13 21:07:44 +03:00
Antonis Lempesis
8fc89ae822
moved context table creation before indicators
2021-09-13 14:33:23 +03:00
Antonis Lempesis
461bf90ca6
fixed the gold_oa definition
2021-09-13 11:10:30 +03:00
Antonis Lempesis
43852bac0e
creating other::other concept for all contexts
2021-09-13 01:36:41 +03:00
Antonis Lempesis
f13cca7e83
moved dependencies of indicators before them...
2021-09-08 23:07:58 +03:00
Antonis Lempesis
c6ada217a1
fixed typo
2021-09-08 22:34:59 +03:00
Antonis Lempesis
1250ae197f
using new indicators for the definition of peerreviewed, gold, and green
2021-09-08 14:08:43 +03:00
Antonis Lempesis
ccee451dde
added indicators of sprint 2 in monitor db
2021-09-07 23:17:13 +03:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
d4dadf6d77
reduced max number of PID in Relatedentity
2021-09-02 14:21:24 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Alessia Bardi
3762b17f7b
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
...
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
5f8ccbc365
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-20 11:13:47 +02:00
Miriam Baglioni
882abb40e4
CrossrefDump -
2021-08-20 11:12:53 +02:00
Miriam Baglioni
45c62609af
CrossrefDump - modified because parameter file was moved
2021-08-20 11:12:31 +02:00
Miriam Baglioni
35880c0e7b
CrossrefDump - changed the wf to be able to resume from one of the steps
2021-08-20 11:11:35 +02:00
Miriam Baglioni
f3b6c392c1
CrossrefDump - moving parameter file under folder crossref_dump_reader
2021-08-20 11:10:58 +02:00
Miriam Baglioni
65822400ce
CrossrefDump - added new parameter file that was missing
2021-08-20 11:10:35 +02:00
Alessia Bardi
a053e1513c
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:32:27 +02:00
Alessia Bardi
812bd54c57
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:30:14 +02:00
Miriam Baglioni
a65d3caaea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-19 10:29:10 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
7c0c67bdd6
added mock pom
2021-08-13 17:45:53 +02:00
Claudio Atzori
82086f3422
fixed directory name
2021-08-13 17:42:14 +02:00
Claudio Atzori
bc7068106c
added crossref download oozie workflow
2021-08-13 17:19:44 +02:00
Claudio Atzori
2c0a05f11a
manually merged PR#139
2021-08-13 17:15:53 +02:00
Claudio Atzori
d43667d857
Merge pull request 'Automatic download of Crossref' ( #138 ) from crossref_dw_wf into beta
...
Reviewed-on: D-Net/dnet-hadoop#138
2021-08-13 17:10:10 +02:00
Miriam Baglioni
5856ca8a7b
merging with branch beta - resolved conflicts
2021-08-13 16:45:45 +02:00
Miriam Baglioni
6fec71e8d2
removed the specific of the infra we are running the wf from the wf name
2021-08-13 16:39:02 +02:00
Miriam Baglioni
ed7e28490a
change in sh
2021-08-13 16:19:01 +02:00
Claudio Atzori
7743d0f919
consolidated dnet wf profiles into the same submodule
2021-08-13 16:14:54 +02:00
Miriam Baglioni
6eb7508995
mergin with branch beta
2021-08-13 16:07:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
dc8b05b39e
Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph
2021-08-13 10:18:25 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00