Flow¶

Disclaimer: work in progress, not complete yet.

Introduction¶

Flow or TQLFlow is a data pipeline that allows the user to designate TQL Clauses in various sequences for more advanced data query, mapping, transforms and more. In TQL (also SQL) the “steps” of execution are fixed as such:

5. SELECT <columns>
1. FROM <table>
2. WHERE <predicate on rows>
3. GROUP BY <columns> COMBINE <aggregator-function>
4. HAVING <predicate on groups> (Not currently available in TQL/TQLFlow)
6. ORDER BY <columns>

ref https://en.wikipedia.org/wiki/SQL_syntax

This form of data query, transform (especially schema-on-read) is better support with a more flexible query language. Flow has only a couple of limitations with regards to the order of the clauses. But first, here are the possible clauses or actions that can be performed:

Flow can be used with IQ and BurstChain® endpoints. In fact, in some cases just a WHERE clause is required as a way of allowing the user to just supply that construct.

Stages¶

Clause	Description
`SELECT`	The same construct as TQL (see ¹ below), except without Aggregator functions.
`SUBFLOW`	Pass documents at this clause through any number of flow aggregations, with each aggregation result outputting an array in a single return document
`JOIN`	Similar to TQL; but only allows for `JOIN`, and stores joined in an array.
`WHERE`	Same as TQL’s `WHERE` clause, filters the set of documents at the current point in TQLFlow.
`GROUP BY`	This will group documents into a list based on the attribute/s listed. (see ² below)
`UNGROUP BY`	This will unwind a list of values into individual documents based on the attribute provided.
`ORDER BY`	Orders the current set of documents at this point of the TQLFlow.
`SKIP`	Skips the number of documents specified for the current set of documents at this stage in the TQLFlow.
`LIMIT`	Similar to TQL, limits the number of documents at the current stage in the TQLFlow.
`FINALIZE`	This is a final anonymous JavaScript function. Consumes one document and must return one document. Can support more complex transforms than possible in the support stages above. (NOTE: Must be the last clause/action )

¹ All of the functions allowed in SELECT (as documented in TQL - Select Functions section) are available within Flow.

² All of the functions allowed in GROUP BY (as documented in TQL - Group By Functions section) are available within Flow.

To understand how this works a Flow always is executed against a collection or chain. Visualize that the entire set of documents in the collection are transformed in some way by the first clause. After the first clause is complete, that set of documents has been altered. The second clause, then acts upon the results of the previous clause. And so on until the end of the Flow. The Flow can consume a lot of resources to execute, it is usually a best practice to begin a Flow with some WHERE and/or SELECT that begins to reduce the starting collection or chain.

Clauses¶

SELECT¶

The same construct as TQL, except without GROUP BY functions.

SELECT <field>, <field>, ... <field>

SELECT can have one or more fields and field can be any of these: column_name, literals, select-functions, ALL, *

Example¶

Input documents:

[
    {
        "primary_surgeon_name": "Smith, John",
        "supplies": [],
        "facility_name": "Facility 1",
        "case_number": "11111",
        "specialty": "medicine"
    },
    {
        "primary_surgeon_name": "Doe, Jane",
        "supplies": ["scissors"],
        "facility_name": "Facility 2",
        "case_number": "22222",
        "specialty": "medicine"
    }
]

SELECT clause:

SELECT primary_surgeon_name,
    facility_name AS facility,
    ['sheet drapes', 'soap'] AS supplies,
    IF(supplies EQ []
        THEN true,
        ELSE false) AS had_supplies

Output documents:

[
    {
        "primary_surgeon_name": "Smith, John",
        "supplies": ["sheet drapes", "soap"],
        "facility": "Facility 1",
        "had_supplies": false
    },
    {
        "primary_surgeon_name": "Doe, Jane",
        "supplies": ["sheet drapes", "soap"],
        "facility": "Facility 2",
        "had_supplies": true
    }
]

For the same input…

SELECT clause:

SELECT ALL,
    facility_name AS facility,
    '0' AS case_number,

Output documents:

[
    {
        "primary_surgeon_name": "Smith, John",
        "supplies": [],
        "facility_name": "Facility 1",
        "facility": "Facility 1",
        "case_number": "0",
        "specialty": "medicine"
    },
    {
        "primary_surgeon_name": "Doe, Jane",
        "supplies": ["scissors"],
        "facility": "Facility 2",
        "case_number": "0",
        "specialty": "medicine"
    }
]

SUBFLOW¶

Pass documents at this stage through any number of flow aggregations, with each aggregation result outputting an array in a single return document

SUBFLOW can contain any number of groups of flow clauses.

SUBFLOW <flow-stages> AS <alias>

SUBFLOW <flow-clauses> AS <alias1>,
    <flow-clauses> AS <alias2>,
    ...,
    <flow-clauses> AS <aliasN>

Examples¶

Input documents:

[   
    {
        "x": 1,
        "y": [1, 2, 3]
    },
    {
        "x": 2,
        "y": [1, 2, 3]
    }
]

SUBFLOW clause:

SUBFLOW [
        SELECT x
    ] AS xs,
    [
        SELECT MAP(y USING elem WITH ADD(elem, x)) AS sum_yx
        UNGROUP BY sum_xy
        GROUP BY ALL COMBINE SUM(sum_xy) AS total_sum
    ] AS sum

Output documents:

[
  {
    "xs": [
      {
        "x":  1
      },
      {
        "x": 2
      }
    ],
    "sum": [
      {
        "total_sum": 21
      }
    ]
  }
]

JOIN¶

Similar to TQL; but only allows for JOIN, and stores joined in an array.

JOIN <table> [AS alias] ON <condition>

Examples¶

Input documents:

[
    {
        "id": "0",
        "value": 1
    },
    {
        "id": "1",
        "value": 2
    }
]

Table ‘lookup_table’ documents:

[
    {
        "lookup_id": "0",
        "lookup_value": 1
    },
    {
        "lookup_id": "0",
        "lookup_value": 2
    },
    {
        "lookup_id": "not present in input",
        "lookup_value": 3
    }
]

JOIN statement:

JOIN lookup_table AS lt ON id EQ lt.lookup_id

Output documents:

[
    {
        "id": "0",
        "value": 1,
        "lt": [
            {
                "lookup_id": "0",
                "lookup_value": 1
            },
            {
                "lookup_id": "0",
                "lookup_value": 2
            }
        ]
    },
    {
        "id": "1",
        "value": 2,
        "lt": []
    }
]

WHERE¶

Same as TQL’s WHERE, filters the set of documents at the current clause in TQLFlow.

WHERE <condition>

Examples¶

Input documents:

[
    {
        "x": 1,
        "y": false
    },
    {
        "x": 2,
        "y": false
    },
    {
        "x": 3,
        "y": true
    }
]

WHERE clause:

WHERE x EQ 2 OR y EQ true

Output documents:

[
    {
        "x": 2,
        "y": false
    },
    {
        "x": 3,
        "y": true
    }
]

GROUP BY¶

This will group documents into a list based on the attribute/s listed.

GROUP BY <columns | * | ALL> COMBINE <group-by-function AS alias>+

Examples¶

Input documents:

[
    {
        "x": 1,
        "y": false
    },
    {
        "x": 2,
        "y": false
    },
    {
        "x": 3,
        "y": true
    }
]

GROUP BY clause:

GROUP BY y SUM(x) AS x_sum, FIRST(y) AS y

Output documents:

[
    {
        "x_sum": 3,
        "y": false
    },
    {
        "x_sum": 3,
        "y": true
    }
]

For the same input…

GROUP BY stage:

GROUP BY ALL SUM(x) AS x_sum

Output documents:

[
    {
        "x_sum": 6
    }
]

UNGROUP BY¶

This will unwind a list of values into individual documents based on the attribute provided.

UNGROUP BY <column> [KEEP]

Examples¶

Input documents:

[
    {
        "x": [1, 2],
        "y": true
    },
    {
        "x": [],
        "y": false
    }
]

UNGROUP BY clause:

UNGROUP BY x

Output documents:

[
    {
        "x": 1,
        "y": true
    },
    {
        "x": 2,
        "y": true
    }
]

For the same input…

UNGROUP BY clause:

UNGROUP BY x KEEP

Output documents:

[
    {
        "x": 1,
        "y": true
    },
    {
        "x": 2,
        "y": true
    },
    {
        "y": false
    }
]

ORDER BY¶

Orders the current set of documents at this point in the TQLFlow.

ORDER BY <columns> [DESC]

Examples¶

Input documents:

[
    {
        "x": 2
    },
    {
        "x": 1
    },
    {
        "x": 3
    }
]

ORDER BY clause:

ORDER BY x

Output documents:

[
    {
        "x": 1
    },
    {
        "x": 2
    },
    {
        "x": 3
    }
]

For the same input…

ORDER BY clause:

ORDER BY x DESC

Output documents:

[
    {
        "x": 3
    },
    {
        "x": 2
    },
    {
        "x": 1
    }
]

SKIP¶

Skips the number of documents specified for the current set of documents at this point in the TQLFlow.

SKIP <integer>

Examples¶

Input documents:

[
    {
        "x": 1
    },
    {
        "x": 2
    },
    {
        "x": 3
    },
    {
        "x": 4
    }
]

SKIP clause:

SKIP 2

Output documents:

[
    {
        "x": 3
    },
    {
        "x": 4
    }
]

LIMIT¶

Similar to TQL, limits the number of documents at the current point in the TQLFlow.

LIMIT <integer>

Examples¶

Input documents:

[
    {
        "x": 1
    },
    {
        "x": 2
    },
    {
        "x": 3
    },
    {
        "x": 4
    }
]

LIMIT clause:

LIMIT 2

Output documents:

[
    {
        "x": 1
    },
    {
        "x": 2
    }
]

FINALIZE¶

This is a final anonymous JavaScript function. Consumes one document and must return one document. Can support more complex transforms than possible in the support stages above. (NOTE: Must be the last clause/action.)

FINALIZE function(<input-document-handle>) {<JavaScript>}

Examples¶

Input documents:

[   
    {
        "x": 1,
        "y": [1, 2, 3]
    },
    {
        "x": 2,
        "y": [1, 2, 3]
    }
]

FINALIZE clause:

FINALIZE function(doc) {

    function add(a, b) {
        return a + b;
    }

    return doc.x.map(elem => add(elem, doc.y));

Output Documents:

[   
    {
        "x": 1,
        "y": [2, 3, 4]
    },
    {
        "x": 2,
        "y": [3, 4, 5]
    }
]

Complex Examples¶

Simple¶

This example looks very similar to a typical TQL call, here it is mandated to perform the WHERE first, then SELECT columns,... and LIMIT the results. Yes this could have been just as easily written in TQL; but the flexibility of modifying the clause order wouldn’t exist.

WHERE primary_surgeon_name IN ('AppleSeed, John')
SELECT primary_surgeon_name AS column,
  supplies,
  facility_name,
  case_numbers
LIMIT 100

In this basic example, lets say that we started with a collection that had these docs in them:

[{
    "primary_surgeon_name": "Smith, John",
    "supplies": [],
    "facility_name": "Facility 1",
    "case_number": "11111",
    "specialty": "medicine"
},
{
    "primary_surgeon_name": "Doe, Jane",
    "supplies": [],
    "facility_name": "Facility 2",
    "case_number": "22222",
    "specialty": "medicine"
},
{
    "primary_surgeon_name": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33331",
    "specialty": "apples"
},
{
    "primary_surgeon_name": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33332",
    "specialty": "apples"
},
{
    "primary_surgeon_name": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33333",
    "specialty": "apples"
},...]

After the first clause WHERE primary_surgeon_name IN ('AppleSeed, John') we would be left with only the documents that had the primary surgeon name of 'AppleSeed, John'. After the Next clause (Select) we would then select the attributes we want to see in the result set with any attribute mapping that needs to be done. The result set of documents after the SELECT would be:

[{
    "column": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33331"
},
{
    "column": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33332"
},
{
    "column": "AppleSeed, John",
    "supplies": [],
    "facility_name": "Facility 3",
    "case_number": "33333"
},...]

The last clause is LIMIT, so in this case if there are more than 100 documents in our result set after the SELECT clause the LIMIT clause will only return the first 100 Documents in the result set.

Complex Example¶

The user can annotate their Flows with comments, both line comments (//) and block comments (/* ... */)

Also of note is that if a string contains a single quote you can escape it with another single quote.

Step-by-Step:

Flow starts with a collection (dictionary) called provider.
WHERE - Restrict the collection to just a handful of surgeons for a specific hospital.
LIMIT - Then limit those results to the “first” 1,000.
JOIN - Next clause joins to a new collection with a condition of encounter and hospital_system_name. The join will create an array of subdocuments (in this case called sc) that match each document from the LEFT document.
UNGROUP BY - It is sometimes necessary to expand all of the subdocuments from a document. In this case, the root document is a surgeon with an array of N surgery cases. UNGROUP-ing them will create N new documents of provider with one surgery case for each input document. For instance, if one provider document contained 3 surgery cases, then the result is 3 new documents, where the surgery_case is no longer an array, instead it’s just a single subdocument.
SELECT - This SELECT restricts the input document to just 2 attributes encounter_id and case_number.
JOIN - Another join occurs with supply_used
UNGROUP BY - Followed by another ungroup
SELECT - Now SELECT the working resultant to 4 attributes
SELECT - This SELECT will keep the 4 attributes from the previous SELECT and also add a field description with a value determined by the IF statement (after comment ‘get total price by encounter’)
ORDER BY - Order the results by description and then case_number
GROUP BY - Groups the results by encounter_id, pairing each unique encounter_id with a total sum of unit_price_us’s that fell into that encounter_id’s bucket
ORDER BY - Order again by cost then by encounter_id in descending order
SKIP - Lastly skip the first 10 documents in the result set

/*
 * simple summing test for data
 */
WHERE hospital_system_name = 'XYZ'
    AND primary_surgeon_name IN ('Sothoth, Yog', 'Nyar''lathotep', 'C''thulhu, Mc', 'Azathoth')
LIMIT 1000
//
// surgery case
//
JOIN surgery_case as sc ON sc.encounter_id = encounter_id
    AND sc.hospital_system_name = 'XYZ'
UNGROUP BY sc
SELECT encounter_id, sc.case_number as case_number
//
// supply used
//
JOIN supply_used as si ON si.encounter_id = encounter_id
    AND si.hospital_system_name = 'XYZ'
UNGROUP BY si keep
SELECT encounter_id,
    case_number,
    si.item_quantity as item_quantity,
    si.unit_price_us as unit_price_us
//
// get total price by encounter
//
SELECT
    *,
    IF(description ISNULL
            THEN CONCAT(case_number, '*')
            ELSE CONCAT(description, ' (', case_number, ')')) AS description
ORDER BY description, case_number
GROUP BY encounter_id COMBINE first(encounter_id) as encounter_id, sum(unit_price_us) as cost
ORDER BY cost, encounter_id DESC
SKIP 10