- Move into the project directory:
cd docker_chaM3Leon
- Create the following directories inside
docker_chaM3Leon:mkdir jars hadoop spark_jars
- Ensure that
docker_chaM3Leonand its subdirectories have the correct permissions:chmod -R 777 .
- Run the following command to build the containers:
docker compose build
- Launch the environment in detached mode:
docker compose up -d
- Place the application
.jarfile in thedocker_chaM3Leon/jarsdirectory.
- Extract the Hadoop configurations from the NameNode container and save them locally in the
hadoopdirectory:docker cp docker_cham3leon-namenode-1:/opt/hadoop/etc/hadoop/. ./hadoop
- Copy Spark JAR files from the Spark Master container to the
spark_jarsdirectory:docker cp docker_cham3leon-spark-master-1:/opt/bitnami/spark/jars/. ./spark_jars
- Restart the Docker environment to apply changes:
docker compose restart
- From the NameNode terminal, run the following commands:
hdfs dfs -mkdir -p /user/spark/eventLog hdfs dfs -mkdir /spark hdfs dfs -mkdir /spark/jars hdfs dfs -mkdir /spark/logs hdfs dfs -put /opt/hadoop/dfs/spark/jars/* /spark/jars
- Ensure Spark has the required permissions and ownership:
hdfs dfs -chmod -R 777 /user/spark/eventLog hdfs dfs -chown -R spark:hadoop /spark hdfs dfs -chmod -R 777 /spark
- Temporary Option (for testing):
hdfs dfs -chmod -R 777 / hdfs dfs -chown -R spark:hadoop /
- From the Kafka container terminal, create topics using the following command:
kafka-topics --create --topic <topic_name> --partitions <num_partitions> --replication-factor <replication_factor> --bootstrap-server kafka1:19092
- Run the application in local mode:
spark-submit --class <main_class> --master spark://spark-master:7077 ./extra_jars/<application_name>.jar
- Run the application in YARN cluster mode:
spark-submit --class <main_class> --master yarn --deploy-mode cluster ./extra_jars/<application_name>.jar
- Run the application in YARN client mode:
spark-submit --class <main_class> --master yarn --deploy-mode client ./extra_jars/<application_name>.jar
- To retrieve the logs of an application:
yarn logs -applicationId <applicationId> -log_files_pattern stderr
- Logs are accessible in the following directory:
/var/log/hadoop/userlogs/<applicationId>/<containerId>/stderr
- Copy the logs from the Hadoop container to the local machine:
docker cp <container_id>:/var/log/hadoop/userlogs/<applicationId>/<containerId>/stderr ./local_dir