Tables saved with the Spark SQL DataFrame.saveAsTable method are not compatible with Hive

Writing a DataFrame directly to a Hive table creates a table that is not compatible with Hive; the metadata stored in the metastore can only be correctly interpreted by Spark. For example:

val hsc = new HiveContext(sc) 
import hsc.implicits._ val 
val df = sc.parallelize(data).toDF() 
df.write.format("parquet").saveAsTable(tableName)

creates a table with this metadata:

inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat 
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

This is also occurs when using explicit schema, such as:

val schema = StructType(Seq(...)) 
val data = sc.parallelize(Seq(Row(...), …)) 
val df = hsc.createDataFrame(data, schema) 
df.write.format("parquet").saveAsTable(tableName)

Workaround: Explicitly create a Hive table to store the data. For example:


Leave a Reply