Spark JavaRDD <String> to csv dataset

Question

What we have:

stream.foreachRDD(rdd->{ JavaRDD<String> javaRDD=rdd.map(elem -> elem.value()); Dataset ds = //Any transformations sc.read().schema(csvSchema).csv(ds).write(); });

What are the ideas
1) Bad because using .collect ()

 sc.sqlContext().createDataset(javaRDD.collect(), Encoders.STRING())

2) 'StringType cannot be cast to StructType', you can map, but StructType :: fromString is not compiled

 sc.sqlContext().createDataFrame(javaRDD,String.class)

3) Without a scheme
Badly because of the low flexibility of the model, readability of the code (30+ fields), uncontrolled order of the fields - in alphabetical order (for example, I write in parquet for impala)

  JavaRDD<String> javaRDD=rdd.map(elem -> new Model(elem.value())); sc.sqlContext().createDataFrame(javaRDD,Model.class);

and what prevents to use Structured Streaming and immediately work through the DataFrame / Dataset API?

Spark JavaRDD <String> to csv dataset

0

More articles: