Trino Adding a New Function
Created
Updated
Outline
How to add a new function to Trino, based on this work:
https://github.com/trinodb/trino/issues/14725
https://github.com/trinodb/trino/pull/16024
trino requires arrays to be all the same type
select array_histogram(a) from (values (array[1,2,1]), (array[42,7,42,null])) t(a);
what is SPI? seems like it has the base primitives stuff
scalarfunction
- value is the name of the func (eg how it’s called in sql land)
why is class final?
typeparameter
- element type (in the array when decorating func)
- also used so i can get maptype as a param?
type
- for element type
sqltype
- trino type (return type when decorating func)
- also used to get an arrayblock to read the input
block
- return value of func
- the most important abstraction for writing funcs
- position
- slice
- getters for java primitive types and object
- offsets into positions
- byte comparisons
- write to blockbuilder
- hash
- get size and logical size
- region size? what is a region?
- weird size funcs…
- some funcs are needed if you implement slice…
- encoding
- loaded into memory
- children?
- block implementations are usually collection types like rows, arrays, or maps (dictionary, variable width)
// Just created a random test class to mess with blocks
// IntArrayBlock looks like the simplest one to show how things work
Block block = new IntArrayBlock(1, 2, new boolean[] {false, false, true}, new int[] {2, 4, 6, 8});
block.getInt(i, 0) // 0 gives 4, 1 gives 6, 2 throws
block.isNull(i) // 0 gives false, 1 gives true, 2 throws
fixedSizeInBytesPerPosition() // 5 (4 bytes for int, 1 byte for null bool)
getSizeInBytes // 10
block.getRegionSizeInBytes(100, 999) // 4995, ignores actual data
block.getPositionsSizeInBytes(null, 999) // 4995, ignores first param
block.getRetainedSizeInBytes // 91, does some fancy java magic to get size of getEstimatedDataSizeForStats // some kind of logical data size (null is 0, ints are 4)
retainedBytesForEachPart // skipped for now
getPositionCount
getInt
mayHaveNull // just a shallow check if the nulls array is non-null
isNull // just checks the position in the null array (if it's non-null)
getSingleValueBlock // creates new intarrayblock with just this position copied into it
copyPositions // copies the arr of positions into a new intarrayblock
getRegion // copies length from position offset
copyRegion // same as get but actually copies underlying array
copyWithAppendedNull // copies the whole thing but puts a null on the end
getValuesSlice // not override, gives direct values access
note creating a block in a test and then debugging it so you can quickly test a bunch of funcs is nice
blocks are immutable once created
it’s confusing to me that positionCount can be 0 even if array is non-empty, only lets you access positionCount things
i find the block interface very strange since it has a bunch of default unsupported methods, why not have more find-grained interfaces for things like byte, int, etc?
why can you still getInt when it is null?
Here are the input block classes for the array hist func
- Simple Blocks
- ByteArrayBlock
- IntArrayBlock
- LongArrayBlock
- VariableWidthBlock (Slice with offsets into it?)
- ArrayBlock
- MapBlock
- RowBlock
blockbuilder
slice
- interesting low level byte storage
operatordependency
- some way to get equal and hash operators for elements?
- blocktypeoperators
- convention
arr specific
- arrayblock
map specific
- mapblock
hist specific
- typedhistogram
testing
docs