Flow¶
Disclaimer: work in progress, not complete yet.
Introduction¶
Flow or TQLFlow is a data pipeline that allows the user to designate TQL Clauses in various sequences for more advanced data query, mapping, transforms and more. In TQL (also SQL) the “steps” of execution are fixed as such:
5. SELECT <columns>
1. FROM <table>
2. WHERE <predicate on rows>
3. GROUP BY <columns> COMBINE <aggregator-function>
4. HAVING <predicate on groups> (Not currently available in TQL/TQLFlow)
6. ORDER BY <columns>
ref https://en.wikipedia.org/wiki/SQL_syntax
This form of data query, transform (especially schema-on-read) is better support with a more flexible query language. Flow has only a couple of limitations with regards to the order of the clauses. But first, here are the possible clauses or actions that can be performed:
Flow can be used with IQ and BurstChain® endpoints. In fact, in some cases just a WHERE
clause is required as a way of allowing the user to just supply that construct.
Stages¶
Clause | Description |
---|---|
SELECT |
The same construct as TQL (see 1 below), except without Aggregator functions. |
SUBFLOW |
Pass documents at this clause through any number of flow aggregations, with each aggregation result outputting an array in a single return document |
JOIN |
Similar to TQL; but only allows for JOIN , and stores joined in an array. |
WHERE |
Same as TQL’s WHERE clause, filters the set of documents at the current point in TQLFlow. |
GROUP BY |
This will group documents into a list based on the attribute/s listed. (see 2 below) |
UNGROUP BY |
This will unwind a list of values into individual documents based on the attribute provided. |
ORDER BY |
Orders the current set of documents at this point of the TQLFlow. |
SKIP |
Skips the number of documents specified for the current set of documents at this stage in the TQLFlow. |
LIMIT |
Similar to TQL, limits the number of documents at the current stage in the TQLFlow. |
FINALIZE |
This is a final anonymous JavaScript function. Consumes one document and must return one document. Can support more complex transforms than possible in the support stages above. (NOTE: Must be the last clause/action ) |
1 All of the functions allowed in SELECT
(as documented in TQL - Select Functions section) are available within Flow.
2 All of the functions allowed in GROUP BY
(as documented in TQL - Group By Functions section) are available within Flow.
To understand how this works a Flow always is executed against a collection or chain. Visualize that the entire set of documents in the collection are transformed in some way by the first clause. After the first clause is complete, that set of documents has been altered. The second clause, then acts upon the results of the previous clause. And so on until the end of the Flow. The Flow can consume a lot of resources to execute, it is usually a best practice to begin a Flow with some WHERE and/or SELECT that begins to reduce the starting collection or chain.
Clauses¶
SELECT¶
The same construct as TQL, except without GROUP BY
functions.
SELECT <field>, <field>, ... <field>
SELECT
can have one or more fields and field
can be any of these: column_name
, literals
, select-functions
, ALL
, *
Example¶
Input documents:
[
{
"primary_surgeon_name": "Smith, John",
"supplies": [],
"facility_name": "Facility 1",
"case_number": "11111",
"specialty": "medicine"
},
{
"primary_surgeon_name": "Doe, Jane",
"supplies": ["scissors"],
"facility_name": "Facility 2",
"case_number": "22222",
"specialty": "medicine"
}
]
SELECT
clause:
SELECT primary_surgeon_name,
facility_name AS facility,
['sheet drapes', 'soap'] AS supplies,
IF(supplies EQ []
THEN true,
ELSE false) AS had_supplies
Output documents:
[
{
"primary_surgeon_name": "Smith, John",
"supplies": ["sheet drapes", "soap"],
"facility": "Facility 1",
"had_supplies": false
},
{
"primary_surgeon_name": "Doe, Jane",
"supplies": ["sheet drapes", "soap"],
"facility": "Facility 2",
"had_supplies": true
}
]
For the same input…
SELECT
clause:
SELECT ALL,
facility_name AS facility,
'0' AS case_number,
Output documents:
[
{
"primary_surgeon_name": "Smith, John",
"supplies": [],
"facility_name": "Facility 1",
"facility": "Facility 1",
"case_number": "0",
"specialty": "medicine"
},
{
"primary_surgeon_name": "Doe, Jane",
"supplies": ["scissors"],
"facility": "Facility 2",
"case_number": "0",
"specialty": "medicine"
}
]
SUBFLOW¶
Pass documents at this stage through any number of flow aggregations, with each aggregation result outputting an array in a single return document
SUBFLOW
can contain any number of groups of flow clauses.
SUBFLOW <flow-stages> AS <alias>
SUBFLOW <flow-clauses> AS <alias1>,
<flow-clauses> AS <alias2>,
...,
<flow-clauses> AS <aliasN>
Examples¶
Input documents:
[
{
"x": 1,
"y": [1, 2, 3]
},
{
"x": 2,
"y": [1, 2, 3]
}
]
SUBFLOW
clause:
SUBFLOW [
SELECT x
] AS xs,
[
SELECT MAP(y USING elem WITH ADD(elem, x)) AS sum_yx
UNGROUP BY sum_xy
GROUP BY ALL COMBINE SUM(sum_xy) AS total_sum
] AS sum
Output documents:
[
{
"xs": [
{
"x": 1
},
{
"x": 2
}
],
"sum": [
{
"total_sum": 21
}
]
}
]
JOIN¶
Similar to TQL; but only allows for JOIN
, and stores joined in an array.
JOIN <table> [AS alias] ON <condition>
Examples¶
Input documents:
[
{
"id": "0",
"value": 1
},
{
"id": "1",
"value": 2
}
]
Table ‘lookup_table’ documents:
[
{
"lookup_id": "0",
"lookup_value": 1
},
{
"lookup_id": "0",
"lookup_value": 2
},
{
"lookup_id": "not present in input",
"lookup_value": 3
}
]
JOIN
statement:
JOIN lookup_table AS lt ON id EQ lt.lookup_id
Output documents:
[
{
"id": "0",
"value": 1,
"lt": [
{
"lookup_id": "0",
"lookup_value": 1
},
{
"lookup_id": "0",
"lookup_value": 2
}
]
},
{
"id": "1",
"value": 2,
"lt": []
}
]
WHERE¶
Same as TQL’s WHERE
, filters the set of documents at the current clause in TQLFlow.
WHERE <condition>
Examples¶
Input documents:
[
{
"x": 1,
"y": false
},
{
"x": 2,
"y": false
},
{
"x": 3,
"y": true
}
]
WHERE
clause:
WHERE x EQ 2 OR y EQ true
Output documents:
[
{
"x": 2,
"y": false
},
{
"x": 3,
"y": true
}
]
GROUP BY¶
This will group documents into a list based on the attribute/s listed.
GROUP BY <columns | * | ALL> COMBINE <group-by-function AS alias>+
Examples¶
Input documents:
[
{
"x": 1,
"y": false
},
{
"x": 2,
"y": false
},
{
"x": 3,
"y": true
}
]
GROUP BY
clause:
GROUP BY y SUM(x) AS x_sum, FIRST(y) AS y
Output documents:
[
{
"x_sum": 3,
"y": false
},
{
"x_sum": 3,
"y": true
}
]
For the same input…
GROUP BY
stage:
GROUP BY ALL SUM(x) AS x_sum
Output documents:
[
{
"x_sum": 6
}
]
UNGROUP BY¶
This will unwind a list of values into individual documents based on the attribute provided.
UNGROUP BY <column> [KEEP]
Examples¶
Input documents:
[
{
"x": [1, 2],
"y": true
},
{
"x": [],
"y": false
}
]
UNGROUP BY
clause:
UNGROUP BY x
Output documents:
[
{
"x": 1,
"y": true
},
{
"x": 2,
"y": true
}
]
For the same input…
UNGROUP BY
clause:
UNGROUP BY x KEEP
Output documents:
[
{
"x": 1,
"y": true
},
{
"x": 2,
"y": true
},
{
"y": false
}
]
ORDER BY¶
Orders the current set of documents at this point in the TQLFlow.
ORDER BY <columns> [DESC]
Examples¶
Input documents:
[
{
"x": 2
},
{
"x": 1
},
{
"x": 3
}
]
ORDER BY
clause:
ORDER BY x
Output documents:
[
{
"x": 1
},
{
"x": 2
},
{
"x": 3
}
]
For the same input…
ORDER BY
clause:
ORDER BY x DESC
Output documents:
[
{
"x": 3
},
{
"x": 2
},
{
"x": 1
}
]
SKIP¶
Skips the number of documents specified for the current set of documents at this point in the TQLFlow.
SKIP <integer>
Examples¶
Input documents:
[
{
"x": 1
},
{
"x": 2
},
{
"x": 3
},
{
"x": 4
}
]
SKIP
clause:
SKIP 2
Output documents:
[
{
"x": 3
},
{
"x": 4
}
]
LIMIT¶
Similar to TQL, limits the number of documents at the current point in the TQLFlow.
LIMIT <integer>
Examples¶
Input documents:
[
{
"x": 1
},
{
"x": 2
},
{
"x": 3
},
{
"x": 4
}
]
LIMIT
clause:
LIMIT 2
Output documents:
[
{
"x": 1
},
{
"x": 2
}
]
FINALIZE¶
This is a final anonymous JavaScript function. Consumes one document and must return one document. Can support more complex transforms than possible in the support stages above. (NOTE: Must be the last clause/action.)
FINALIZE function(<input-document-handle>) {<JavaScript>}
Examples¶
Input documents:
[
{
"x": 1,
"y": [1, 2, 3]
},
{
"x": 2,
"y": [1, 2, 3]
}
]
FINALIZE
clause:
FINALIZE function(doc) {
function add(a, b) {
return a + b;
}
return doc.x.map(elem => add(elem, doc.y));
Output Documents:
[
{
"x": 1,
"y": [2, 3, 4]
},
{
"x": 2,
"y": [3, 4, 5]
}
]
Complex Examples¶
Simple¶
This example looks very similar to a typical TQL call, here it is mandated to perform the WHERE
first, then SELECT columns,...
and LIMIT
the results. Yes this could have been just as easily written in TQL; but the flexibility of modifying the clause order wouldn’t exist.
WHERE primary_surgeon_name IN ('AppleSeed, John')
SELECT primary_surgeon_name AS column,
supplies,
facility_name,
case_numbers
LIMIT 100
In this basic example, lets say that we started with a collection that had these docs in them:
[{
"primary_surgeon_name": "Smith, John",
"supplies": [],
"facility_name": "Facility 1",
"case_number": "11111",
"specialty": "medicine"
},
{
"primary_surgeon_name": "Doe, Jane",
"supplies": [],
"facility_name": "Facility 2",
"case_number": "22222",
"specialty": "medicine"
},
{
"primary_surgeon_name": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33331",
"specialty": "apples"
},
{
"primary_surgeon_name": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33332",
"specialty": "apples"
},
{
"primary_surgeon_name": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33333",
"specialty": "apples"
},...]
After the first clause WHERE primary_surgeon_name IN ('AppleSeed, John')
we would be left with only the documents that had the primary surgeon name of 'AppleSeed, John'
.
After the Next clause (Select
) we would then select the attributes we want to see in the result set with any attribute mapping that needs to be done. The result set of documents after the SELECT
would be:
[{
"column": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33331"
},
{
"column": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33332"
},
{
"column": "AppleSeed, John",
"supplies": [],
"facility_name": "Facility 3",
"case_number": "33333"
},...]
The last clause is LIMIT
, so in this case if there are more than 100 documents in our result set after the SELECT
clause the LIMIT
clause will only return the first 100 Documents in the result set.
Complex Example¶
The user can annotate their Flows with comments, both line comments (//
) and block comments (/* ... */
)
Also of note is that if a string contains a single quote you can escape it with another single quote.
Step-by-Step:
- Flow starts with a collection (dictionary) called provider.
WHERE
- Restrict the collection to just a handful of surgeons for a specific hospital.LIMIT
- Then limit those results to the “first” 1,000.JOIN
- Next clause joins to a new collection with a condition of encounter and hospital_system_name. The join will create an array of subdocuments (in this case called sc) that match each document from the LEFT document.UNGROUP BY
- It is sometimes necessary to expand all of the subdocuments from a document. In this case, the root document is a surgeon with an array of N surgery cases. UNGROUP-ing them will create N new documents of provider with one surgery case for each input document. For instance, if one provider document contained 3 surgery cases, then the result is 3 new documents, where the surgery_case is no longer an array, instead it’s just a single subdocument.SELECT
- ThisSELECT
restricts the input document to just 2 attributesencounter_id
andcase_number
.JOIN
- Another join occurs with supply_usedUNGROUP BY
- Followed by another ungroupSELECT
- NowSELECT
the working resultant to 4 attributesSELECT
- ThisSELECT
will keep the 4 attributes from the previousSELECT
and also add a fielddescription
with a value determined by the IF statement (after comment ‘get total price by encounter’)ORDER BY
- Order the results by description and then case_numberGROUP BY
- Groups the results by encounter_id, pairing each unique encounter_id with a total sum of unit_price_us’s that fell into that encounter_id’s bucketORDER BY
- Order again by cost then by encounter_id in descending orderSKIP
- Lastly skip the first 10 documents in the result set
/*
* simple summing test for data
*/
WHERE hospital_system_name = 'XYZ'
AND primary_surgeon_name IN ('Sothoth, Yog', 'Nyar''lathotep', 'C''thulhu, Mc', 'Azathoth')
LIMIT 1000
//
// surgery case
//
JOIN surgery_case as sc ON sc.encounter_id = encounter_id
AND sc.hospital_system_name = 'XYZ'
UNGROUP BY sc
SELECT encounter_id, sc.case_number as case_number
//
// supply used
//
JOIN supply_used as si ON si.encounter_id = encounter_id
AND si.hospital_system_name = 'XYZ'
UNGROUP BY si keep
SELECT encounter_id,
case_number,
si.item_quantity as item_quantity,
si.unit_price_us as unit_price_us
//
// get total price by encounter
//
SELECT
*,
IF(description ISNULL
THEN CONCAT(case_number, '*')
ELSE CONCAT(description, ' (', case_number, ')')) AS description
ORDER BY description, case_number
GROUP BY encounter_id COMBINE first(encounter_id) as encounter_id, sum(unit_price_us) as cost
ORDER BY cost, encounter_id DESC
SKIP 10