Specification of the Ntuple Query Processor in PAW (since version 2.07)


See also the PAW 2.07 release notes.


Introduction

The interactive analysis of high energy physics event data using HBOOK ntuples has always been one of the major strengths of the PAW system. With the design and implementation of the PIAF system and the advent of Column Wise Ntuples (CWNs) the quantity of event data which can be analysed interactively has been significantly increased. Other improvements due to the use of CWNs include the support for more data types and new possibilities for structuring the data. CWNs can store boolean, bit field, integer, real, double precision and character string data types. Data can be structured in arrays of one or more dimensions, with the last dimension of each array being of variable length. This allows for efficient storage of lists of values, lists of vectors etc. The actual length of the array in a given event is defined by another integer variable in the ntuple.

The component of the PAW system responsible for executing queries like ntuple/plot or ntuple/project was previously only partially adapted to handle these new features. Furthermore, this part of PAW lacked stability and robustness, and was hindered by many built-in limitations.

To overcome these deficiencies a new query processor has been developed which removes most limitations of the old system. More importantly, it offers a regular syntax which allows much greater freedom to the user when expressing queries. It also supports all the data types offered by the CWNs in combination with automatic type promotion. The new query language is largely backward compatible, with more powerful alternatives being offered in the few incompatible cases. Non scalar ntuple variables like vectors and multi dimensional arrays, possibly of variable length, are fully integrated. Other new features include access to KUIP vectors, and the use of arrays that index other arrays, which allows for more complex structures in ntuples.

The structure of the query compiler allows for optimisations, like static sub-expression evaluation, and reordering of terms based on their complexity. It is foreseen to upgrade the MASK mechanism and to use this to cache the results of cuts (macros).

Basic types supported

The variable type supported by the new query processor are those defined in the Column Wise Ntuple:

The Ntuple variables can be simple variables (RWN and CWN) or array variables (CWN only).

Basic operators

In the table below, the key letters L,R and N refer to the operator precedence as being Left, Right or Non associative respectively. The table begins with the operator of highest precedence .


+------------+---------------+------------+-------------------------------+---------------------------------+
| Precedence |   Operator    |  Operator  |    Description                |   Type                          |
|   level    |               | precedence |                               |                                 |
+------------+---------------+------------+-------------------------------+---------------------------------+
|      1     | **            |     R      | exponential                   |                                 |
+------------+---------------+------------+-------------------------------+---------------------------------+
|      2     | -             |     L      | unary minus                   | U32,U64,I32,I64,F32,F64         |
+------------+---------------+------------+-------------------------------+---------------------------------+
|      3     | *             |     L      | multiply                      | U32,U64,I32,I64,F32,F64         |
|            | /             |     L      | divide                        | U32,U64,I32,I64,F32,F64         |
+------------+---------------+------------+-------------------------------+---------------------------------+
|      4     | +             |     L      | addition                      | U32,U64,I32,I64,F32,F64 (S ??)  |
|            | -             |     L      | subtracton                    | U32,U64,I32,I64,F32,F64         |
+------------+---------------+------------+-------------------------------+---------------------------------+
|            | = == .eq.     |     N      | equality                      | U32,U64,I32,I64,F32,F64,S       |
|            | < >  != .ne.  |     N      | non-equality                  | U32,U64,I32,I64,F32,F64,S       |
|            | # .ct.        |     N      | close to (*specify*)          | U32,U64,I32,I64,F32,F64,S       |
|            | <  .lt.       |     N      | less then                     | U32,U64,I32,I64,F32,F64,S       |
|            | <  = .le.     |     N      | less then or equal            | U32,U64,I32,I64,F32,F64,S       |
|            | >  .gt.       |     N      | greater than                  | U32,U64,I32,I64,F32,F64,S       |
|            | > = .ge.      |     N      | greater than or equal         | U32,U64,I32,I64,F32,F64,S       |
|            | a <  b <  c   |     N      | dbl comp.(also with .lt. etc) | U32,U64,I32,I64,F32,F64,S       |
|      5     | a <  b < = c  |     N      |                               |                                 |
|            | a < = b <  c  |     N      |                               |                                 |
|            | a > = b > = c |     N      |                               |                                 |
|            | a >  b >  c   |     N      |                               |                                 |
|            | a >  b > = c  |     N      |                               |                                 |
|            | a > = b >  c  |     N      |                               |                                 |
|            | a > = b > = c |     N      |                               |                                 |
+------------+---------------+------------+-------------------------------+---------------------------------+
|            | !  .not.      |     R      |  unary boolean not            | B                               |
|      6     | && .and.      |     L      |  boolean and                  | B                               |
|            | || .or.       |     L      |  boolean or                   | B                               |
+------------+---------------+------------+-------------------------------+---------------------------------+

Basic expressions

The basic expression types supported by the new query processor appear below:

  boolean_expression       :       boolean_term
                                   boolean_expression '.or.' boolean_term

  boolean_term             :       boolean_factor
                                   boolean_term '.or.' boolean_factor

  boolean_factor           :       comparison
                                   .not. boolean_factor

  comparison               :       expression
                                   expression CMP_OP expression
                                   expression CMP_OP expression CMP_OP expression

  expression               :       literal
                                   name
                                   expression NUM_OP expression
                                   '(' boolean_expression ')'

  name                     :       nt-variable
                                   builtin-function
                                   comis-function
                                   kuip-vector
                                   cut

The concept of arrays and shape matching

Arrays

Ntuple variables can support upto seven dimensions, the last one can be variable length.

KUIP vectors can be three dimensional.

Arrays can be indexed by signed or unsigned integer expresions. Each dimension should have one index, which can be either scalar or a range.

A range can be i:j, :j, i: or :. The missing parameter is taken at its extreme (1 or the length of the dimension) If the array name is not followed by ( the full array is used, otherwise the reference should look like for example 'SomeArray(index1,index2)' for a two dimensional array.

If one or more indices are ranges the resultant object is a smaller array, possibly of a lower dimensionality.

Shape matching

When arrays are used in expressions the shapes of the arrays, that is the number of dimensions and the length of each dimension, should follow the following rules.

Assume a operator O with n scalar arguments

r = O(a1...an)

This operator is then defined for one or more non scalar arguments as (in quasi tensor notation)

r(i) = O(a1...ak(i)..al(i)..an)

This implies that all non scalar arguments have the same shape and that the result r will have this shape. This automatially defines e.g. scalar times array as componentwise multiplication.

For the builtin functions of more then one argument it has been defined how they interact with non scalar arguments.

Builtin functions

Constants

Name Value
pi 3.14159 (real)
dpi 3.1415927.... (double)
true true (B)
false false (B)
uint32_min Smallest 32 bits unsigned integer
uint32_max Bigest 32 bits unsigned integer
uint64_min Smallest 64 bits unsigned integer
uint64_max Bigest 64 bits unsigned integer
int32_min Smallest 32 bits integer
int32_max Bigest 32 bits integer
int64_min Smallest 64 bits integer
int64_max Bigest 64 bits integer
float32_min Smallest 32 bits floating point
float32_max Bigest 32 bits floating point
float64_min Smallest 64 bits floating point
float64_max Bigest 64 bits floating point

Math functions

Bit handling functions

The operand(s) and result are both bit strings . In the discussion below, m and n, can be an Ntuple variable, a bit string constant or an expression with a bit string result. Bits are numbered from left to right, starting from zero.

Logical operations

An example of a bit string function is:

   
    NTUPLE/SCAN 30 IAND(X,B'1010').EQ.B'0101'

Shift operations

The shift count is represented by k. The absolute value of k specifies the number of positions to shift, while its sign specifies the direction of the shift, i.e. a left shift when k>0, no shift when k=0 and a right shift when k<0.

Bit testing

Bit subfields

Conversion

for any numeric or boolean arg:

Vector reduction operations

These functions uses all element they get. One can give a slice of the 2D array like vmin(p(3,:)), each array index can either be an integer selecting a specific row/column/... or a range. One can specifiy the full range by giving :, You can also do thing like 2:3'or istart: (see before).

Strings manipulations

Two functions are available:

Predefined cuts

Predefined cuts are a kind of aliases. A cut is referenced by a number between 1 and 99 inclusive and can contain any valid ``boolean_expression'' (see above). Inside an expression a cut is referenced by giving a $ followed by the number of the cut. Cuts are defined by the CUT command, which can also be used to list them. Cuts can also be written to file and read back later on.

A special type of cut exists which is called a graphical cut, these are defined by the (new) command GCUT. The command GCUT is like NT/PLOT, but after the command the user can select a polygon interactively. If a graphical cut is applied, the expressions used to define the cut ar reevaluated and the cut returns true if the resulting point is inside the polygon. In the case of a one dimensional plot the graphical cut selects a range.

COMIS functions

Function known to the COMIS interpreter can be used in expressions. There are three possibilities.

  • name(...) : A function is already known to comis.
  • name.f(...) : A fortran source file name.f is loaded by comis and the function name is called.
  • name.f77(...) : A fortran source file name.f is compiled and loaded by COMIS, the function name is called.
  • The types of the arguments and the return type of the function are checked as far as possible. The number of arguments for a COMIS function is not limited.

    Sometimes, a file name can be ambigous with other construct. For example on VMS characters like [,] can be used or / in UNIX file names which can be interpreted as a divide operation. To escape from those kind of ambiguties, file names can be enclose with double quotes "...". Note that for simple file names like myfile.f the double quotes can be ommited.

    Kuip vectors

    KUIP vectors can be used in selection mechanism or as arguments to a COMIS function. Real or integer arrays with maximum three dimensions. The KUIP vector should not be changed during the Ntuple processing.


    Release Notes Known bugs FAQs Contributions Tutorial Reference manual Down load Miscellaneous

    Paw.Support@Cern.Ch