How to create a spark data frame from a txt file with multiple data points, each having a label and 4 features and their values given like 1:0.4
So, I have a txt file with a rows of numbers
0.0 1:5.1 2:3.5 3:1.4 4:0.2
0.0 1:4.9 2:3.0 3:1.4 4:0.2
...
and so on.
0.0
is a label, 0.0
, 0.1
, 0.2
and so on. the column separated values are features and the respective feature's values.
I want to load this to spark dataframe.
I did the following but the features are returned as nulls.
# Define the schema
schema = StructType([
StructField("label", FloatType(), nullable=False),
StructField("feature1", FloatType(), nullable=False),
StructField("feature2", FloatType(), nullable=False),
StructField("feature3", FloatType(), nullable=False),
StructField("feature4", FloatType(), nullable=False)
])
# Load the data into a DataFrame using the specified schema
df = spark.read.option("delimiter", " ").csv("file_path", schema=schema)
# Show the first few rows of the DataFrame
df.show()
Comments
Post a Comment