-
spark-submit 하드웨어 옵션 체크하기Data Engineering/Apache Spark 2022. 2. 7. 14:44반응형
페이지 참고사항 (계산식)
https://aws.amazon.com/ko/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/#:~:text=For example%2C the default for,application based on the workloads.
계산된 옵션 목록
executor-cores
executor-memory
spark.driver.cores
driver-memory
num_executors
spark.default.parallelism
Linux Cpu, Memory 확인 방법
# Cpu core 확인 lscpu # Memory 확인 free -h
Source (Shell)
#!/usr/bin/env bash # 서버의 전체 CPU CORE 개수 CPU_CORE=64 # 서버의 전체 Memory 용량 - GB MEMORY=40 # Spark에서 사용할 메모리의 총 용량, yarn overhead memory 용량 제외 (1.0 - overhead) SPARK_MEMORY_FRACTION=0.9 # spark node의 개수 () SPARK_NODE_COUNT=1 # 각 노드당 사용할 Cpu Core의 개수, 상수로 5개가 가장 최적화라고 함 SPARK_EXECUTORS_CORES=5 # 계산 NUMBER_OF_EXECUTORS_PER_INSTANCE=$(expr $(expr ${CPU_CORE} - 1) / ${SPARK_EXECUTORS_CORES}) SPARK_EXECUTOR_MEMORY_1=$(echo "${MEMORY}/${NUMBER_OF_EXECUTORS_PER_INSTANCE}" | bc) SPARK_EXECUTOR_MEMORY=$(printf %.$1f $(echo "scale=0;${SPARK_EXECUTOR_MEMORY_1}*${SPARK_MEMORY_FRACTION}" | bc -l)) SPARK_DRIVER_MEMORY=${SPARK_EXECUTOR_MEMORY} SPARK_DRIVER_CORES=${SPARK_EXECUTORS_CORES} SPARK_INSTANCE=$(echo "${NUMBER_OF_EXECUTORS_PER_INSTANCE}*${SPARK_NODE_COUNT}-1" | bc) SPARK_DEFAULT_PARALLELISM=$(echo "${SPARK_INSTANCE}*${SPARK_EXECUTORS_CORES}*2" | bc) echo "------- spark-submit Options -------" echo "executor-cores = ${SPARK_EXECUTORS_CORES}" echo "executor-memory = ${SPARK_EXECUTOR_MEMORY}" echo "spark.driver.cores = ${SPARK_DRIVER_CORES}" echo "driver-memory = ${SPARK_DRIVER_MEMORY}" echo "num_executors = ${SPARK_INSTANCE}" echo "spark.default.parallelism = ${SPARK_DEFAULT_PARALLELISM}" echo "------- spark-submit Options -------"
반응형'Data Engineering > Apache Spark' 카테고리의 다른 글
Spark Join Tuning & Key Salting / Part 3 (0) 2022.03.05 Spark Partitions Tuning / Part 02 (0) 2022.03.04 Spark 기본기(Executor Tuning) / Part 01 (0) 2022.03.03