What we have:

stream.foreachRDD(rdd->{ JavaRDD<String> javaRDD=rdd.map(elem -> elem.value()); Dataset ds = //Any transformations sc.read().schema(csvSchema).csv(ds).write(); }); 

What are the ideas
1) Bad because using .collect ()

 sc.sqlContext().createDataset(javaRDD.collect(), Encoders.STRING()) 

2) 'StringType cannot be cast to StructType', you can map, but StructType :: fromString is not compiled

 sc.sqlContext().createDataFrame(javaRDD,String.class) 

3) Without a scheme
Badly because of the low flexibility of the model, readability of the code (30+ fields), uncontrolled order of the fields - in alphabetical order (for example, I write in parquet for impala)

  JavaRDD<String> javaRDD=rdd.map(elem -> new Model(elem.value())); sc.sqlContext().createDataFrame(javaRDD,Model.class); 
  • and what prevents to use Structured Streaming and immediately work through the DataFrame / Dataset API? - Alex Chermenin

0