2023-03-16

pyspark failed to read in csv file. WARN FileStreamSink: Assume no metadata directory. java.net.ConnectException: Connection refused;

I'm trying to read csv file using pyspark.
Before I installed Hadoop, I was able to read in the file using pyspark. However, after I changed my java version to java8 and installed Hadoop, I couldn't read in the file and an error poped out as following. I only changed java version and JAVA_HOME during installation. It seems that some settings have been changed. I'm new to pyspark and hadoop. I tried to see connection refused but still have no idea. Could you please tell me where should I start to check?

Thank you very much in advance!

23/03/15 18:52:47 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: ../sales.csv.
java.net.ConnectException: Call From MacBook-Pro-890.local/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

The below configurations are provided for your convenience. FYI:
core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

zshrc

## JAVA env
export PATH="/usr/local/mysql/bin:$PATH"
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"

## MAVEN env
export M2_HOME=/usr/local/apache-maven-3.9.0
export PATH=${PATH}:${M2_HOME}/bin

## Homebrew env
export PATH=/opt/homebrew/bin:$PATH
export PATH=/opt/homebrew/sbin:$PATH

export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

export SPARK_HOME=/opt/homebrew/Cellar/apache-spark/3.3.2/libexec 
export PATH=$SPARK_HOME/bin:$PATH 

export SPARK_OPTS="--packages graphframes:graphframes:0.8.2-spark3.0-s_2.12"

## HADOOP env variables
export HADOOP_HOME="/opt/homebrew/Cellar/hadoop/3.3.4/libexec"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

enter image description here



No comments:

Post a Comment