Antonis Lempesis
227e10f4b3
commenting out the collab indicators because they still fail
2022-08-05 12:54:36 +03:00
Antonis Lempesis
8b0407d8ec
fixed the datasourceOrganization relations
2022-08-03 12:26:59 +03:00
Antonis Lempesis
1778d40c40
latest version of indicators
2022-08-02 13:39:34 +03:00
Antonis Lempesis
6fc9ef53f6
addded command line params to allow hive actions to run
2022-07-29 16:36:20 +03:00
Antonis Lempesis
9886fe87ec
- Added FOS classification
...
- Added extra orgs in monitor
- Fixed result-project and organization-project tables
2022-07-29 16:34:50 +03:00
Antonis Lempesis
ab18c9daa9
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2022-06-09 15:48:21 +03:00
Antonis Lempesis
574492c659
removed double result_apc table creation from monitor
2022-06-09 15:48:13 +03:00
Antonis Lempesis
db088cc69c
fixed *_organization tables
2022-06-07 04:04:28 +03:00
Antonis Lempesis
3fc9efeab6
fixed typo, addded open citations and apcs in monitor
2022-05-13 14:28:13 +03:00
Antonis Lempesis
23334479bb
removed yet another collab, added more orgs in monitor
2022-05-11 13:05:52 +03:00
Antonis Lempesis
61b4c19e65
restored indi_result_org_country_collab, removed indi_result_org_collab
2022-05-06 12:52:10 +03:00
Antonis Lempesis
cfbbcaf7c4
commented out indi_result_org_country_collab
2022-05-06 12:49:36 +03:00
Antonis Lempesis
0353f93d54
added new hive opts
2022-04-29 12:49:27 +03:00
Antonis Lempesis
b7cd2c6ca1
added open citations
2022-04-20 14:46:55 +03:00
Antonis Lempesis
c442c91f89
computing stats in each step
2022-04-06 12:40:02 +03:00
Antonis Lempesis
7112806a73
views cannot be stored as parquet...
2022-03-29 16:37:29 +03:00
Antonis Lempesis
fff0b3cc19
added apcs in monitor db
2022-03-29 14:15:31 +03:00
Antonis Lempesis
ee24f3eb2c
views cannot be stored as parquet...
2022-03-29 13:47:48 +03:00
Antonis Lempesis
d8503cd191
added moooar organizations
2022-03-24 14:02:36 +02:00
Antonis Lempesis
62f91b0869
cleanup
2022-03-22 16:17:49 +02:00
Antonis Lempesis
2e8394ecf8
creating aaall tables as parquet
2022-03-22 16:16:08 +02:00
Antonis Lempesis
dcfbeb8142
yet more typos
2022-03-21 12:36:03 +02:00
Antonis Lempesis
ad78e505da
yet another fix
2022-03-03 12:28:12 +02:00
Antonis Lempesis
efeeebfee1
fixed query after the change in the indicator table
2022-03-02 13:29:25 +02:00
Antonis Lempesis
3b92a2ab9c
added the rest of spring 6 in monitor db
2022-02-23 12:05:57 +02:00
Antonis Lempesis
87c91f70a2
added sprint 6 indicators to monitor db
2022-02-22 14:41:48 +02:00
dimitrispie
58c59f46eb
Added Sprint 6
2022-02-17 10:21:09 +02:00
Antonis Lempesis
5772f92dba
merged beta chnages in hive branch
2022-02-15 13:24:51 +02:00
Antonis Lempesis
393a4ee956
fixed yet another typo...
2022-02-15 12:56:50 +02:00
Antonis Lempesis
5f762cbd09
fixed yet another typo
2022-02-07 12:09:12 +02:00
Antonis Lempesis
ae633c566b
fixed the result_result table
2022-02-04 15:04:19 +02:00
Antonis Lempesis
c2b44530a3
typo...
2022-02-03 13:44:07 +02:00
Antonis Lempesis
dbd2646d59
fixed the result_result creation for monitor
2022-02-03 12:37:10 +02:00
Antonis Lempesis
81ee654271
added result_result relations
2021-12-23 15:46:17 +02:00
Antonis Lempesis
7551e52e95
fixed a typo
2021-12-23 15:33:53 +02:00
Antonis Lempesis
16539d7360
added usage stats
2021-12-22 02:54:42 +02:00
Antonis Lempesis
3edd661608
fixed column names
2021-12-21 22:55:04 +02:00
Antonis Lempesis
a4c0cbb98c
fixed typos in indicators. Added extra views in monitor
2021-12-21 15:54:38 +02:00
Antonis Lempesis
58996972d9
added first indicator of sprint 5
2021-12-21 03:35:04 +02:00
dimitrispie
c1cdec09a9
Sprint 5 and other changes
2021-12-20 19:23:57 +02:00
Antonis Lempesis
ddd34087c2
removed 'stored as parquet' from views..
2021-12-13 23:05:00 +02:00
Antonis Lempesis
915f758c82
moving data to impala cluster and creating shadow databases there
2021-12-13 16:26:14 +02:00
Antonis Lempesis
d05210ba99
finished migration to hive only
2021-11-30 19:01:48 +02:00
dimitrispie
09fc2afdca
Added indi_funder_country_collab
...
Kept only indi_pub_has_cc_licence
2021-11-26 16:13:10 +02:00
Antonis Lempesis
0b4163ee0b
added sprint3,4, removed 2, chaos
2021-11-26 15:58:01 +02:00
Antonis Lempesis
12749a0a77
first
2021-11-26 15:40:40 +02:00
dimitrispie
29f69f2f89
Sprint 4
2021-11-26 15:22:04 +02:00
Antonis Lempesis
cb3adb90f4
Merge branch 'beta' into beta
2021-11-17 14:33:45 +01:00
Antonis Lempesis
c283406829
added Universidad Polytecnica de Madrid
2021-11-17 15:33:00 +02:00
Claudio Atzori
e0395719d7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-17 14:17:27 +01:00
Claudio Atzori
82a4e4efae
[cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206
2021-11-17 14:17:22 +01:00
Miriam Baglioni
6d4a1c57ee
[Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved
2021-11-17 12:41:52 +01:00
Claudio Atzori
0a727d325d
[dedup] increased number of partitions in the consistency phase
2021-11-16 08:43:41 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Claudio Atzori
668ac25224
[graph resolution] using existing argument parser file name
2021-11-15 17:02:45 +01:00
Claudio Atzori
7d0a03f607
[graph resolution] minor
2021-11-15 14:45:54 +01:00
Claudio Atzori
941a50a2fc
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-15 14:42:49 +01:00
Claudio Atzori
7c804acda8
[graph resolution] minor
2021-11-15 14:42:43 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Claudio Atzori
d2c787d416
[graph resolution] fixed sequence of the workflow steps
2021-11-15 14:31:15 +01:00
Claudio Atzori
975b10b711
[actionmanager] increased spark.sql.shuffle.partitions to 5000
2021-11-15 12:31:45 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
ffb0ce1d59
merge with beta - resolved conflict in pom
2021-11-12 10:19:59 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Sandro La Bruzzo
3469cc2b1d
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-12 09:56:52 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Antonis Lempesis
26f086dd64
removed the too restrctive clause. will discuss again
2021-11-11 12:57:19 +02:00
Claudio Atzori
148289150f
Merge branch 'beta' into doiboost_url
2021-11-11 10:40:19 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
6d3c4c4abe
mergin with branch beta
2021-11-11 08:59:53 +01:00
Miriam Baglioni
c371b23077
-
2021-11-10 17:00:37 +01:00
Miriam Baglioni
9e214ce0eb
[BypassAS] addition of OC relations
2021-11-09 12:07:19 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
6f7ca539c6
[BypassAS] update of results for bipFinder and FOS
2021-11-09 11:25:41 +01:00
Miriam Baglioni
a7d50c499b
[BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores
2021-11-08 16:44:19 +01:00
Antonis Lempesis
91354c6068
- fetching all context related results
...
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni
df7ee77c7a
[DOIBoost Mapping] removed not needed comments
2021-11-04 16:24:07 +01:00
Miriam Baglioni
de63d29b6f
[DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id)
2021-11-04 16:16:40 +01:00
Miriam Baglioni
d50057b2d9
[DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi
2021-11-03 16:59:37 +01:00
Miriam Baglioni
edf55395e9
added test resourse
2021-11-03 16:49:30 +01:00
Miriam Baglioni
d97ea82a29
[DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi
2021-11-03 16:45:15 +01:00
Miriam Baglioni
96769b4481
[DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed
2021-11-03 16:43:36 +01:00
Miriam Baglioni
683fe093cf
[DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record
2021-11-03 15:51:26 +01:00
Miriam Baglioni
b2bb8d9d79
[DOIBoost - Mapping] selecting the url from Crossref containing the doi
2021-11-03 15:44:57 +01:00
Miriam Baglioni
779318961c
[DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url
2021-11-03 14:38:52 +01:00
Miriam Baglioni
2480e590d1
[DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis
2021-11-03 14:25:23 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Miriam Baglioni
76d41602be
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-18 10:53:22 +02:00
Miriam Baglioni
46f82c7c8f
removed not needed folder deletion
2021-10-18 10:53:16 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Antonis Lempesis
41ecb1eb61
invalidating medatadata before context thingies
2021-10-15 13:42:55 +03:00
Antonis Lempesis
4b7c8dff2d
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-14 18:53:35 +03:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Claudio Atzori
14fbf92ad6
Merge branch 'beta' into beta_solr_config
2021-10-14 11:08:44 +02:00
Claudio Atzori
b292e4a700
[stats wf] added extra logging in the context data retrieval phase
2021-10-13 17:31:53 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
Miriam Baglioni
5d9cc2452d
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-13 15:33:50 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
16b28494a9
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-13 11:34:24 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Serafeim Chatzopoulos
201ce71cc1
Add resultsubject, relprojectname and resultacceptanceyear to __all field
2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos
e468a7b96b
Add tests to query Solr with different configurations
2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos
de81007302
Add exploreTestConfig, a new Solr configuration folder
2021-10-08 16:54:56 +03:00
Sandro La Bruzzo
8f99d2af86
Make the node of doiBoost to point to the correct OpenAire Organization in relations
2021-10-08 08:35:12 +02:00
Alessia Bardi
c48c43fa9e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-07 17:30:53 +02:00
Alessia Bardi
8d3b60f446
test for patching records for EOSC Future
2021-10-07 17:30:45 +02:00
miconis
611ca511db
set configuration property in openorgs duplicates wf
2021-10-07 15:39:55 +02:00