Tables saved with the Spark SQL DataFrame.saveAsTable method are not compatible with Hive

Writing a DataFrame directly to a Hive table creates a table that is not compatible with Hive; the metadata stored in the metastore can only be correctly interpreted by Spark. For example:

val hsc = new HiveContext(sc) 
import hsc.implicits._ val 
val df = sc.parallelize(data).toDF() 

creates a table with this metadata:


This is also occurs when using explicit schema, such as:

val schema = StructType(Seq(...)) 
val data = sc.parallelize(Seq(Row(...), …)) 
val df = hsc.createDataFrame(data, schema) 

Workaround: Explicitly create a Hive table to store the data. For example:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.