Spark Out Of Memory Error
OOM error Spark driver level:
1. Spark driver is the main control of spark application .if its configured with less memory to collect all data of files then it throws error.
2. If table size which is to be broadcasted is huge then also driver faces OOM error.
OOM error Spark executor level:
1. Spark job is executed though one or more number of stages and each stages consists of multiple task. No of task at executor level depends on spark.executor.cores property. If it is set higher value without consideration of memory required then spark job fails with OOM error.
Ideally 3 or 5 cores are assigned to each executor.
2. In dataframe operation if more number of columns are selected in query then memory overhead will be higher .sometimes it also encounters OOM error.
Normally, Required columns are selected for large dataframe operation.
3. Executor memory container consists of yarn memory overhead and executor memory.
Yarn container memory is needed for JVM overheads, internal strings or other metadata requirement.We need to configure spark.yarn.executor.memoryOverhead to a proper value.
Practically , Yarn Memory Overhead(off-heap memory) should be 10% of Executor level memory.
4. Spark application which involves high shuffling operation like group by , join etc causes OOM error as it shuffling of data is done via executor level if any executor is busy or involves in high GC the it throws error.
So we should follow functions which involves less shuffling as much as possible.
Hope you like my blogs!
Happy Learning!
1 Comments
I agree with you. To solve this OOM issue, simply increase memory/resource and increase parallism ... don't use collect, count such operation to prevent OOM at driver level.
ReplyDeleteRegards
venu
spark training institute in hyderabad
Please do not enter any spam link in the comment box