Create automatic grids to use for block processing of array-like objects
AutoGrid.RdWe provide various utility functions to create grids that can be used for block processing of array-like objects:
defaultAutoGrid()is the default automatic grid maker. It creates a grid that is suitable for block processing of the array-like object passed to it.rowAutoGrid()andcolAutoGrid()are more specialized automatic grid makers, for the 2-dimensional case. They can be used to create a grid where the blocks are made of full rows or full columns, respectively.defaultSinkAutoGrid()is a specialized version ofdefaultAutoGrid()for creating a grid that is suitable for writing to a RealizationSink derivative while walking on it.
Usage
defaultAutoGrid(x, block.length=NULL, chunk.grid=NULL, block.shape=NULL)
## Two specialized "automatic grid makers" for the 2-dimensional case:
rowAutoGrid(x, nrow=NULL, block.length=NULL)
colAutoGrid(x, ncol=NULL, block.length=NULL)
## Replace default automatic grid maker with user-defined one:
getAutoGridMaker()
setAutoGridMaker(GRIDMAKER="defaultAutoGrid")
## A specialized version of defaultAutoGrid() to create an automatic
## grid on a RealizationSink derivative:
defaultSinkAutoGrid(sink, block.length=NULL, chunk.grid=NULL)Arguments
- x
An array-like or matrix-like object for
defaultAutoGrid.A matrix-like object for
rowAutoGridandcolAutoGrid.- block.length
The length of the blocks i.e. the number of array elements per block. By default the automatic block length (returned by
getAutoBlockLength(type(x)), orgetAutoBlockLength(type(sink))in the case ofdefaultSinkAutoGrid()) is used. Depending on how much memory is available on your machine, you might want to increase (or decrease) the automatic block length by adjusting the automatic block size withsetAutoBlockSize().- chunk.grid
The grid of physical chunks. By default
chunkGrid(x)(orchunkGrid(sink)in the case ofdefaultSinkAutoGrid()) is used.- block.shape
A string specifying the shape of the blocks. See
makeCappedVolumeBoxfor a description of the supported shapes. By defaultgetAutoBlockShape()is used.- nrow
The number of rows of the blocks. The bottommost blocks might have less. See examples below.
- ncol
The number of columns of the blocks. The rightmost blocks might have less. See examples below.
- GRIDMAKER
The function to use as automatic grid maker, that is, the function that will be used by
blockApply()andblockReduce()to make a grid when no grid is supplied via theirgridargument. The function will be called on array-like objectxand must return an ArrayGrid object, saygrid, that is compatible withxi.e. such thatrefdim(grid)is identical todim(x).GRIDMAKERcan be specified as a function or as a single string naming a function. It can be a user-defined function or a pre-defined grid maker likedefaultAutoGrid,rowAutoGrid, orcolAutoGrid.The automatic grid maker is set to
defaultAutoGridat package startup and can be reset anytime to this value by callingsetAutoGridMaker()with no argument.- sink
A RealizationSink derivative.
Details
By default, primary block processing functions blockApply()
and blockReduce() use the grid returned by
defaultAutoGrid(x) to walk on the blocks of array-like
object x. This can be changed with setAutoGridMaker().
By default sinkApply() uses the grid returned by
defaultSinkAutoGrid(sink) to walk on the viewports of
RealizationSink derivative sink and write to them.
Value
defaultAutoGrid: An ArrayGrid object on reference
array x. The grid elements define the blocks that will be used to
process x by block. The grid is optimal in the sense that:
It's compatible with the grid of physical chunks a.k.a. chunk grid. This means that, when the chunk grid is known (i.e. when
chunkGrid(x)is not NULL orchunk.gridis supplied), every block in the grid contains one or more full chunks. In other words, chunks never cross block boundaries.Its resolution is such that the blocks have a length that is as close as possibe to (but does not exceed)
block.length. An exception is made when some chunks already have a length that is >=block.length, in which case the returned grid is the same as the chunk grid.
Note that the returned grid is regular (i.e. is a RegularArrayGrid object) unless the chunk grid is not regular (i.e. is an ArbitraryArrayGrid object).
rowAutoGrid: A RegularArrayGrid object on
reference array x where the grid elements define blocks made
of full rows of x.
colAutoGrid: A RegularArrayGrid object on
reference array x where the grid elements define blocks made
of full columns of x.
defaultSinkAutoGrid: Like defaultAutoGrid except
that defaultSinkAutoGrid always returns a grid with a
"first-dim-grows-first" shape (note that, unlike the former, the
latter has no block.shape argument).
The advantage of using a grid with a "first-dim-grows-first" shape in
the context of writing to the viewports of a RealizationSink
derivative is that such a grid is guaranteed to work with "linear write
only" realization backends. See important notes about "Cross realization
backend compatibility" in ?write_block in the
S4Arrays package for more information.
See also
setAutoBlockSizeandsetAutoBlockShapeto control the geometry of automatic blocks.blockApplyand family for convenient block processing of an array-like object.ArrayGrid in the S4Arrays package for the formal representation of grids and viewports.
The
makeCappedVolumeBoxutility to make capped volume boxes.read_blockandwrite_blockin the S4Arrays package.
Examples
## ---------------------------------------------------------------------
## A VERSION OF sum() THAT USES BLOCK PROCESSING
## ---------------------------------------------------------------------
block_sum <- function(a, grid) {
sums <- lapply(grid, function(viewport) sum(read_block(a, viewport)))
sum(unlist(sums))
}
## On an ordinary matrix:
m <- matrix(runif(600), ncol=12)
m_grid <- defaultAutoGrid(m, block.length=120)
sum1 <- block_sum(m, m_grid)
sum1
#> [1] 302.5836
## On a DelayedArray object:
library(HDF5Array)
#> Loading required package: h5mread
#> Loading required package: rhdf5
#>
#> Attaching package: ‘h5mread’
#> The following object is masked from ‘package:rhdf5’:
#>
#> h5ls
M <- as(m, "HDF5Array")
sum2 <- block_sum(M, m_grid)
sum2
#> [1] 302.5836
sum3 <- block_sum(M, colAutoGrid(M, block.length=120))
sum3
#> [1] 302.5836
sum4 <- block_sum(M, rowAutoGrid(M, block.length=80))
sum4
#> [1] 302.5836
## Sanity checks:
sum0 <- sum(m)
stopifnot(identical(sum1, sum0))
stopifnot(identical(sum2, sum0))
stopifnot(identical(sum3, sum0))
stopifnot(identical(sum4, sum0))
## ---------------------------------------------------------------------
## defaultAutoGrid()
## ---------------------------------------------------------------------
grid <- defaultAutoGrid(m, block.length=120)
grid
#> 5 x 2 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2]
#> [1,] [1-11,1-10] [1-11,11-12]
#> [2,] [12-22,1-10] [12-22,11-12]
#> [3,] [23-33,1-10] [23-33,11-12]
#> [4,] [34-44,1-10] [34-44,11-12]
#> [5,] [45-50,1-10] [45-50,11-12]
as.list(grid) # turn the grid into a list of ArrayViewport objects
#> [[1]]
#> 11 x 10 ArrayViewport object on a 50 x 12 array: [1-11,1-10]
#>
#> [[2]]
#> 11 x 10 ArrayViewport object on a 50 x 12 array: [12-22,1-10]
#>
#> [[3]]
#> 11 x 10 ArrayViewport object on a 50 x 12 array: [23-33,1-10]
#>
#> [[4]]
#> 11 x 10 ArrayViewport object on a 50 x 12 array: [34-44,1-10]
#>
#> [[5]]
#> 6 x 10 ArrayViewport object on a 50 x 12 array: [45-50,1-10]
#>
#> [[6]]
#> 11 x 2 ArrayViewport object on a 50 x 12 array: [1-11,11-12]
#>
#> [[7]]
#> 11 x 2 ArrayViewport object on a 50 x 12 array: [12-22,11-12]
#>
#> [[8]]
#> 11 x 2 ArrayViewport object on a 50 x 12 array: [23-33,11-12]
#>
#> [[9]]
#> 11 x 2 ArrayViewport object on a 50 x 12 array: [34-44,11-12]
#>
#> [[10]]
#> 6 x 2 ArrayViewport object on a 50 x 12 array: [45-50,11-12]
#>
table(lengths(grid))
#>
#> 12 22 60 110
#> 1 4 1 4
stopifnot(maxlength(grid) <= 120)
grid <- defaultAutoGrid(m, block.length=120,
block.shape="first-dim-grows-first")
grid
#> 1 x 6 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] [ ,1-2] [ ,3-4] [ ,5-6] [ ,7-8] [ ,9-10] [ ,11-12]
table(lengths(grid))
#>
#> 100
#> 6
stopifnot(maxlength(grid) <= 120)
grid <- defaultAutoGrid(m, block.length=120,
block.shape="last-dim-grows-first")
grid
#> 5 x 1 RegularArrayGrid object on a 50 x 12 array:
#> [,1]
#> [1,] [1-10, ]
#> [2,] [11-20, ]
#> [3,] [21-30, ]
#> [4,] [31-40, ]
#> [5,] [41-50, ]
table(lengths(grid))
#>
#> 120
#> 5
stopifnot(maxlength(grid) <= 120)
defaultAutoGrid(m, block.length=100)
#> 5 x 2 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2]
#> [1,] [1-10,1-10] [1-10,11-12]
#> [2,] [11-20,1-10] [11-20,11-12]
#> [3,] [21-30,1-10] [21-30,11-12]
#> [4,] [31-40,1-10] [31-40,11-12]
#> [5,] [41-50,1-10] [41-50,11-12]
defaultAutoGrid(m, block.length=75)
#> 6 x 2 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2]
#> [1,] [1-9,1-8] [1-9,9-12]
#> [2,] [10-18,1-8] [10-18,9-12]
#> [3,] [19-27,1-8] [19-27,9-12]
#> [4,] [28-36,1-8] [28-36,9-12]
#> [5,] [37-45,1-8] [37-45,9-12]
#> [6,] [46-50,1-8] [46-50,9-12]
defaultAutoGrid(m, block.length=25)
#> 10 x 3 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2] [,3]
#> [1,] [1-5,1-5] [1-5,6-10] [1-5,11-12]
#> [2,] [6-10,1-5] [6-10,6-10] [6-10,11-12]
#> [3,] [11-15,1-5] [11-15,6-10] [11-15,11-12]
#> [4,] [16-20,1-5] [16-20,6-10] [16-20,11-12]
#> [5,] [21-25,1-5] [21-25,6-10] [21-25,11-12]
#> [6,] [26-30,1-5] [26-30,6-10] [26-30,11-12]
#> [7,] [31-35,1-5] [31-35,6-10] [31-35,11-12]
#> [8,] [36-40,1-5] [36-40,6-10] [36-40,11-12]
#> [9,] [41-45,1-5] [41-45,6-10] [41-45,11-12]
#> [10,] [46-50,1-5] [46-50,6-10] [46-50,11-12]
defaultAutoGrid(m, block.length=20)
#> 10 x 3 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2] [,3]
#> [1,] [1-5,1-4] [1-5,5-8] [1-5,9-12]
#> [2,] [6-10,1-4] [6-10,5-8] [6-10,9-12]
#> [3,] [11-15,1-4] [11-15,5-8] [11-15,9-12]
#> [4,] [16-20,1-4] [16-20,5-8] [16-20,9-12]
#> [5,] [21-25,1-4] [21-25,5-8] [21-25,9-12]
#> [6,] [26-30,1-4] [26-30,5-8] [26-30,9-12]
#> [7,] [31-35,1-4] [31-35,5-8] [31-35,9-12]
#> [8,] [36-40,1-4] [36-40,5-8] [36-40,9-12]
#> [9,] [41-45,1-4] [41-45,5-8] [41-45,9-12]
#> [10,] [46-50,1-4] [46-50,5-8] [46-50,9-12]
defaultAutoGrid(m, block.length=10)
#> 17 x 4 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2] [,3] [,4]
#> [1,] [1-3,1-3] [1-3,4-6] [1-3,7-9] [1-3,10-12]
#> [2,] [4-6,1-3] [4-6,4-6] [4-6,7-9] [4-6,10-12]
#> [3,] [7-9,1-3] [7-9,4-6] [7-9,7-9] [7-9,10-12]
#> [4,] [10-12,1-3] [10-12,4-6] [10-12,7-9] [10-12,10-12]
#> [5,] [13-15,1-3] [13-15,4-6] [13-15,7-9] [13-15,10-12]
#> [6,] [16-18,1-3] [16-18,4-6] [16-18,7-9] [16-18,10-12]
#> [7,] [19-21,1-3] [19-21,4-6] [19-21,7-9] [19-21,10-12]
#> [8,] [22-24,1-3] [22-24,4-6] [22-24,7-9] [22-24,10-12]
#> [9,] [25-27,1-3] [25-27,4-6] [25-27,7-9] [25-27,10-12]
#> [10,] [28-30,1-3] [28-30,4-6] [28-30,7-9] [28-30,10-12]
#> [11,] [31-33,1-3] [31-33,4-6] [31-33,7-9] [31-33,10-12]
#> [12,] [34-36,1-3] [34-36,4-6] [34-36,7-9] [34-36,10-12]
#> [13,] [37-39,1-3] [37-39,4-6] [37-39,7-9] [37-39,10-12]
#> [14,] [40-42,1-3] [40-42,4-6] [40-42,7-9] [40-42,10-12]
#> [15,] [43-45,1-3] [43-45,4-6] [43-45,7-9] [43-45,10-12]
#> [16,] [46-48,1-3] [46-48,4-6] [46-48,7-9] [46-48,10-12]
#> [17,] [49-50,1-3] [49-50,4-6] [49-50,7-9] [49-50,10-12]
## ---------------------------------------------------------------------
## rowAutoGrid() AND colAutoGrid()
## ---------------------------------------------------------------------
rowAutoGrid(m, nrow=10) # 5 blocks of 10 rows each
#> 5 x 1 RegularArrayGrid object on a 50 x 12 array:
#> [,1]
#> [1,] [1-10, ]
#> [2,] [11-20, ]
#> [3,] [21-30, ]
#> [4,] [31-40, ]
#> [5,] [41-50, ]
rowAutoGrid(m, nrow=15) # 3 blocks of 15 rows each plus 1 block of 5 rows
#> 4 x 1 RegularArrayGrid object on a 50 x 12 array:
#> [,1]
#> [1,] [1-15, ]
#> [2,] [16-30, ]
#> [3,] [31-45, ]
#> [4,] [46-50, ]
colAutoGrid(m, ncol=5) # 2 blocks of 5 cols each plus 1 block of 2 cols
#> 1 x 3 RegularArrayGrid object on a 50 x 12 array:
#> [,1] [,2] [,3]
#> [1,] [ ,1-5] [ ,6-10] [ ,11-12]
## See '?RealizationSink' for advanced examples of user-implemented
## block processing using colAutoGrid() and a realization sink.
## ---------------------------------------------------------------------
## REPLACE DEFAULT AUTOMATIC GRID MAKER WITH USER-DEFINED ONE
## ---------------------------------------------------------------------
getAutoGridMaker()
#> [1] "defaultAutoGrid"
setAutoGridMaker(function(x) colAutoGrid(x, ncol=5))
getAutoGridMaker()
#> function (x)
#> colAutoGrid(x, ncol = 5)
#> <environment: 0x561f7e75b928>
blockApply(m, function(block) currentViewport())
#> [[1]]
#> 50 x 5 ArrayViewport object on a 50 x 12 array: [ ,1-5]
#>
#> [[2]]
#> 50 x 5 ArrayViewport object on a 50 x 12 array: [ ,6-10]
#>
#> [[3]]
#> 50 x 2 ArrayViewport object on a 50 x 12 array: [ ,11-12]
#>
## Reset automatic grid maker to factory settings:
setAutoGridMaker()