We have not found clear definitions of properties in the official document and source code, so we examined the properties listed in Spark Web UI, and copied their descriptions as follows.
| Property | Category | Reference | Description |
|---|---|---|---|
| avg hash probe bucket list iters | Cost | Link | the average bucket list iterations per lookup during aggregation |
| data size | Cardinality | Link | Estimated size of broadcast/shuffled/collected data of the operator |
| data size of build side | Cost | Link | the size of the built hash map |
| fetch wait time | Cost | Link | the time spent on fetching data (local and remote) |
| local blocks read | Cardinality | Link | the number of blocks read locally |
| local bytes read | Cardinality | Link | the number of bytes read locally |
| metadata time | Cost | Link | the time spent on getting metadata like the number of partitions, number of files |
| number of output rows | Cardinality | Link | the number of output rows of the operator |
| peak memory | Cost | Link | the peak memory usage in the operator |
| records read | Cardinality | Link | the number of read records |
| remote blocks read | Cardinality | Link | the number of blocks read remotely |
| remote bytes read | Cardinality | Link | the number of bytes read remotely |
| remote bytes read to disk | Cardinality | Link | the number of bytes read from remote to local disk |
| scan time | Cost | Link | the time spent on scanning data |
| shuffle bytes written | Cardinality | Link | the number of bytes written |
| shuffle records written | Cardinality | Link | the number of records written |
| shuffle write time | Cardinality | Link | the time spent on shuffle writing |
| sort time | Cost | Link | the time spent on sorting |
| spill size | Cost | Link | number of bytes spilled to disk from memory in the operator |
| time in aggregation build | Cost | Link | the time spent on aggregation |
| time to build hash map | Cost | Link | the time spent on building a hash map |
| time to collect | Cost | Link | the time spent on collecting data |