ITechShree-Data-Analytics-Technologies

Spark Out Of Memory Error

 


Spark Out Of Memory Error

 

 

 

 OOM error Spark driver level:

1. Spark driver is the main control of spark application .if its configured with less  memory to collect all data of files then it throws error.

2. If table size which is to be broadcasted is huge then also driver faces OOM error.

 

 OOM error Spark executor level:

1. Spark job is executed  though one or more  number of stages and each stages consists of multiple task. No of task at executor level  depends on spark.executor.cores property. If it is set  higher value without consideration of memory required then spark job fails with OOM error.

Ideally 3 or 5  cores are assigned to each executor.

2. In dataframe operation if more number of columns  are selected in query then  memory overhead will be higher .sometimes it also encounters OOM error.

Normally, Required columns are selected for large dataframe operation.

3. Executor memory container consists of yarn memory overhead and executor memory.

Yarn container memory is needed for JVM overheads, internal strings or other metadata requirement.We need to configure spark.yarn.executor.memoryOverhead to a proper value.

Practically ,  Yarn Memory Overhead(off-heap memory) should be 10% of Executor level memory.

4. Spark application which involves high shuffling operation like group by , join etc causes OOM error as it shuffling of data is done via executor level if any executor is busy or involves in high GC the it throws error.

So we should follow  functions which involves less shuffling as much as possible.

 

 

Hope you like  my blogs!

Happy Learning!

Post a Comment

1 Comments

  1. I agree with you. To solve this OOM issue, simply increase memory/resource and increase parallism ... don't use collect, count such operation to prevent OOM at driver level.
    Regards
    venu
    spark training institute in hyderabad

    ReplyDelete

Please do not enter any spam link in the comment box