pyspark failed to read in csv file. WARN FileStreamSink: Assume no metadata directory. java.net.ConnectException: Connection refused;
I'm trying to read csv file using pyspark.
Before I installed Hadoop, I was able to read in the file using pyspark. However, after I changed my java version to java8 and installed Hadoop, I couldn't read in the file and an error poped out as following. I only changed java version and JAVA_HOME during installation. It seems that some settings have been changed. I'm new to pyspark and hadoop. I tried to see connection refused but still have no idea. Could you please tell me where should I start to check?
Thank you very much in advance!
23/03/15 18:52:47 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: ../sales.csv.
java.net.ConnectException: Call From MacBook-Pro-890.local/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
The below configurations are provided for your convenience. FYI:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
zshrc
## JAVA env
export PATH="/usr/local/mysql/bin:$PATH"
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"
## MAVEN env
export M2_HOME=/usr/local/apache-maven-3.9.0
export PATH=${PATH}:${M2_HOME}/bin
## Homebrew env
export PATH=/opt/homebrew/bin:$PATH
export PATH=/opt/homebrew/sbin:$PATH
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export SPARK_HOME=/opt/homebrew/Cellar/apache-spark/3.3.2/libexec
export PATH=$SPARK_HOME/bin:$PATH
export SPARK_OPTS="--packages graphframes:graphframes:0.8.2-spark3.0-s_2.12"
## HADOOP env variables
export HADOOP_HOME="/opt/homebrew/Cellar/hadoop/3.3.4/libexec"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Comments
Post a Comment