191 lines
9.0 KiB
Plaintext
Executable File
191 lines
9.0 KiB
Plaintext
Executable File
Useful notes
|
|
*************
|
|
|
|
This section contains useful tips to help with understanding function usage, ease
|
|
shell interaction and introduce some basic characteristics of the dynamic virtual tables.
|
|
|
|
.. _unotesexamples:
|
|
|
|
Understanding function examples
|
|
===============================
|
|
|
|
While reading each function's description, you will notice examples like this:
|
|
|
|
>>> sql("select ifthenelse(1>0,'yes','no') as answer")
|
|
answer
|
|
------
|
|
yes
|
|
|
|
Because examples also serve as tests for madIS's functions, they are coded and displayed as actual Python code. The string inside Python function "sql()" is
|
|
an SQL query, having its results displayed in the lines following it. Query results in examples also include the column names for clarity.
|
|
However in the interactive shell column names are not shown. In the interactive shell, the same as above example, would look like the following::
|
|
|
|
mterm> select ifthenelse(1>0,'yes','no') as answer;
|
|
yes
|
|
Query executed in 0 min. 0 sec 33 msec
|
|
|
|
|
|
.. _unotesparameters:
|
|
|
|
Understanding function parameters
|
|
=================================
|
|
|
|
**Named parameters**
|
|
|
|
All types of functions (row, aggregate and virtual table) can take simple or named parameters.
|
|
For simple parameters, ordering is important and should follow function's definition. Named
|
|
parameters can be placed anywhere in the parameters lists, with few exceptions that are clearly stated.
|
|
Named parameters are strings (so they must be placed inside quotes) using the format *paramname:paramvalue*.
|
|
|
|
For example :mod:`~functions.vtable.file` function is declared as::
|
|
|
|
file(location[, formatting options])
|
|
|
|
where *location* is a simple parameter containing the file name or network resource to read from,
|
|
and *formatting options* is a list of named parameters (eg. parameter *dialect* which accepts one of values *tsv* or *csv*).
|
|
So using this function to import a csv file would look like this::
|
|
|
|
select * from file('myfile','dialect:csv');
|
|
|
|
By using inverted syntax (see :ref:`SQL extensions <inversion>`), the query would look like this::
|
|
|
|
select * from (file 'myfile' dialect:csv);
|
|
|
|
Notice that non named parameters must be quoted. However named parameters quoting is optional. Also parameters are separated with spaces instead of commas.
|
|
If a space character appears in a named parameter's value field, quoting should be used.
|
|
So importing a file that uses spaces as value delimiters, would look like following (using inverted syntax)::
|
|
|
|
select * from (file 'myfile' 'delimiter: ');
|
|
|
|
An example of a function definition which contains named parameters is :mod:`~functions.vtable.output`::
|
|
|
|
output(query:None, file[, formatting options])
|
|
|
|
The definition above states, that the function output receives the named parameter *query* which has no default value (the value following the ":" character
|
|
of a named parameter in a functions' definition, represents the default value).
|
|
|
|
.. _unoteqparameter:
|
|
|
|
**The query parameter**
|
|
|
|
Named parameter *query* shown in the definition above, defines which SQL query the function *output* will execute
|
|
and then write the query's results in the given *file*.
|
|
When using, in inverted syntax, a function that takes a *query* parameter (see :ref:`SQL extensions <inversion>`) the *query* keyword should be omitted.
|
|
Instead the actual query should be placed at the end of the statement. For example to output, in inverted syntax, some query's result, the following could be used::
|
|
|
|
output 'outfile' dialect:tsv select * from table1;
|
|
|
|
In non inverted syntax, this is equivalent to::
|
|
|
|
select * from output('query:select * from table1','outfile','dialect:tsv');
|
|
|
|
Inverting queries that include the *query* parameter, permits "pipelining" of many virtual table functions, while using an easy to read syntax.
|
|
So to re-order a file and add line numbers you can do::
|
|
|
|
output 'outfile.csv' rowidvt select * from (file 'infile.csv') order by C2;
|
|
|
|
Let's look at the details of this query. :mod:`~functions.vtable.file` function reads *infile.csv* and returns the data as a table.
|
|
Returned data are then ordered by column C2.
|
|
The result of the query, *select \* from (file 'infile.csv') order by C2* is the *query* parameter for :mod:`~functions.vtable.rowidvt` function.
|
|
This function adds a *rowid* column to the query result that indicates the order of the row in the result (and not in the initial table).
|
|
The result of :mod:`~functions.vtable.rowidvt` function is forwarded into :mod:`~functions.vtable.output` which writes the result in the *outfile.csv* file.
|
|
|
|
|
|
.. _unotesreturntypes:
|
|
|
|
Understanding function return types
|
|
===================================
|
|
|
|
To understand how function return types works, make sure you understand how `SQLite types <http://www.sqlite.org/datatype3.html>`_ work.
|
|
The return type of the row and aggregate madIS functions refers to the storage class typically returned by the function.
|
|
Virtual table functions return a table having a static (or dynamic) schema with column names and `types affinity <http://www.sqlite.org/datatype3.html#affinity>`_.
|
|
To clarify the difference of the returned types of row/aggregate functions and virtual table functions consider the following example::
|
|
|
|
mterm> select * from (select kwnum('lol') as a) where a=='1';
|
|
Query executed in 0 min. 0 sec 17 msec
|
|
|
|
Here :func:`~functions.row.text.kwnum` function returns the number of keywords of the 'lol' string which is an integer.
|
|
Although the first returned column (named *a*) is of integer type, the literal text '1' in the *where* part is not converted to an integer,
|
|
so the condition fails. In the following example, by using the built in SQLite function *cast*, an integer type affinity
|
|
is suggested so the text literal '1' is automatically converted to an integer and the condition is evaluated to true.
|
|
|
|
::
|
|
|
|
mterm> select * from (select cast(kwnum('lol') as int) as a) where a=='1';
|
|
1
|
|
Query executed in 0 min. 0 sec 21 msec
|
|
|
|
In conclusion, the returned values from row and aggregate functions, even :ref:`multisets <tutmultiset>` do not also imply data affinity.
|
|
|
|
.. note::
|
|
|
|
The same result could be reached using the virtual table functions, eg. :mod:`~functions.vtable.typing` or :mod:`~functions.vtable.setschema`.
|
|
|
|
.. _unotesshell:
|
|
|
|
Shell interaction
|
|
=================
|
|
|
|
In madIS's interactive shell mode some helpful shell commands and functions helping with complex query composition are supported.
|
|
These can be listed in the terminal with the *.help* command.
|
|
|
|
**Names function**
|
|
|
|
Function :mod:`~functions.vtable.names` is a virtual table function that operates over a query (See :ref:`unotesparameters`)
|
|
and returns the column names of the query result.
|
|
|
|
::
|
|
|
|
mterm> names select * from file('test.csv','header:t') ;
|
|
City|Region|Country|Population
|
|
Query executed in 0 min. 0 sec 24 msec
|
|
|
|
Function :mod:`~functions.vtable.coltypes`, has similar functionality, additionally presenting the column's
|
|
`type affinity <http://www.sqlite.org/datatype3.html#affinity>`_ when this information is contained in its input.
|
|
|
|
::
|
|
|
|
mterm> coltypes select * from file('/home/meili/Desktop/test.csv','header:t') ;
|
|
City|text
|
|
Region|text
|
|
Country|text
|
|
Population|text
|
|
|
|
|
|
.. _unotesemptyschema:
|
|
|
|
Empty schema exception
|
|
======================
|
|
|
|
A price to pay for using virtual tables having dynamic schema creation, is that if no data are provided, for example an empty file
|
|
to the :mod:`~functions.vtable.file` function or
|
|
a query that returns no rows to virtual tables that accept input *queries* (eg. :mod:`~functions.vtable.cache` function), is an exception as a result::
|
|
|
|
mterm> select * from (cache select 5 as a where a!=5);
|
|
Madis SQLError: operator cache: Cannot initialise dynamic schema virtual table without data
|
|
|
|
To avoid this issue, especially in automatic flow execution where a flow crash is undesirable, the :mod:`~functions.vtable.setschema` function can be used.
|
|
*Setschema* is virtual table function that enforces a given schema. In the case of non empty resultsets, it can be used to project,
|
|
rename and typecast inner query columns, while in case of an empty resultset, it works as a static schema definition.
|
|
::
|
|
|
|
mterm> select * from (setschema 'a' cache select 5 as a where a!=5);
|
|
Query executed in 0 min. 0 sec 62 msec
|
|
|
|
In the example above, *cache* function has a possibility of producing an empty schema exception. Applying the *setschema* function with the desirable schema
|
|
(column a with NONE type) this problem is avoided.
|
|
|
|
Other possible causes of empty schema exceptions are also :ref:`multiset <tutmultiset>` functions which are actually implemented
|
|
through the :mod:`~functions.vtable.expand` virtual table function.
|
|
|
|
.. _unotesefficiencyvt:
|
|
|
|
Efficiency in streaming virtual tables
|
|
======================================
|
|
|
|
Virtual table functions that work as streams do not need to materialise the whole input/output data streams before returning them, they can produce them on demand.
|
|
So executing, for example a *join*, between streaming virtual tables for a huge volume of data is not a good idea, as the data will have to be re-produced
|
|
over and over again. Using :ref:`cache <tutcache>` virtual table function on one of the *streaming* queries will significantly increase performance.
|
|
|
|
|