Skip to content

Optimization

_SUCCESS

TIP

Ommit _SUCCESS files from being written to S3

  • mapreduce.fileoutputcommitter.marksuccessfuljobs = false

Small Files

  • spark.sql.files.maxPartitionBytes=33554432 (32mb)
  • spark.sql.shuffle.partitions=4

Parameters

  • spark.sql.adaptive.enabled = true

  • spark.sql.adaptive.coalescePartitions.enabled = true

  • spark.sql.adaptive.skewJoin.enabled = true

  • spark.sql.adaptive.localShuffleReader.enabled = true

  • spark.sql.autoBroadcastJoinThreshold = 10010241024

  • spark.sql.join.preferSortMergeJoin = false

  • spark.executor.heartbeatInterval

  • spark.sql.broadcastTimeout

  • spar.default.parallelism

  • spark.dynamicAllocation.enabled = true

  • spark.dynamicAllocation.executorIdleTimeout

  • spark.dynamicAllocation.minExecutors

  • spark.dynamicAllocation.initialExecutors

  • spark.dynamicAllocation.maxExecutors

Joins

  • Shuffle Hash
  • Broadcast Hash
  • Sort Merge

Feel free to use any content here.