Trino Adding a New Function

Data Al Dente

Home | Posts | About Me

Created @January 22, 2023

Updated @January 22, 2023

Trino Version 405

Outline

How to add a new function to Trino, based on this work:

https://github.com/trinodb/trino/issues/14725

https://github.com/trinodb/trino/compare/master...nathanwilk7:trino:nathanwilk7/operator-array-histogram?diff=unified

https://github.com/trinodb/trino/pull/16024

trino requires arrays to be all the same type

select array_histogram(a) from (values (array[1,2,1]), (array[42,7,42,null])) t(a);

what is SPI? seems like it has the base primitives stuff

scalarfunction

value is the name of the func (eg how it’s called in sql land)

why is class final?

typeparameter

element type (in the array when decorating func)

also used so i can get maptype as a param?

type

for element type

sqltype

trino type (return type when decorating func)

also used to get an arrayblock to read the input

block

return value of func

the most important abstraction for writing funcs

position

slice

getters for java primitive types and object

offsets into positions

byte comparisons

write to blockbuilder

hash

get size and logical size

region size? what is a region?

weird size funcs…

some funcs are needed if you implement slice…

encoding

loaded into memory

children?

block implementations are usually collection types like rows, arrays, or maps (dictionary, variable width)

// Just created a random test class to mess with blocks
// IntArrayBlock looks like the simplest one to show how things work
Block block = new IntArrayBlock(1, 2, new boolean[] {false, false, true}, new int[] {2, 4, 6, 8});
block.getInt(i, 0) // 0 gives 4, 1 gives 6, 2 throws
block.isNull(i) // 0 gives false, 1 gives true, 2 throws
fixedSizeInBytesPerPosition() // 5 (4 bytes for int, 1 byte for null bool)
getSizeInBytes // 10
block.getRegionSizeInBytes(100, 999) // 4995, ignores actual data
block.getPositionsSizeInBytes(null, 999) // 4995, ignores first param
block.getRetainedSizeInBytes // 91, does some fancy java magic to get size of getEstimatedDataSizeForStats // some kind of logical data size (null is 0, ints are 4)
retainedBytesForEachPart // skipped for now
getPositionCount
getInt
mayHaveNull // just a shallow check if the nulls array is non-null
isNull // just checks the position in the null array (if it's non-null)
getSingleValueBlock // creates new intarrayblock with just this position copied into it
copyPositions // copies the arr of positions into a new intarrayblock 
getRegion // copies length from position offset
copyRegion // same as get but actually copies underlying array
copyWithAppendedNull // copies the whole thing but puts a null on the end
getValuesSlice // not override, gives direct values access

note creating a block in a test and then debugging it so you can quickly test a bunch of funcs is nice

blocks are immutable once created

it’s confusing to me that positionCount can be 0 even if array is non-empty, only lets you access positionCount things

i find the block interface very strange since it has a bunch of default unsupported methods, why not have more find-grained interfaces for things like byte, int, etc?

why can you still getInt when it is null?

Here are the input block classes for the array hist func

Simple Blocks
- ByteArrayBlock
- IntArrayBlock
- LongArrayBlock

VariableWidthBlock (Slice with offsets into it?)

ArrayBlock

MapBlock

RowBlock

blockbuilder

slice

interesting low level byte storage

operatordependency

some way to get equal and hash operators for elements?

blocktypeoperators

convention

arr specific

arrayblock

map specific

mapblock

hist specific

typedhistogram

testing

docs