April 2016 - Silicon Tern

HBase Basics Quizz

1.  What is HBase? Distributed scalable database Distributed non-relational, open source database Distributed row-oriented database Distributed scalable Big Data store 2. HBase and HDFS Are both great for batch processing HBase is designed for sequential access only HDFS is designed for fast lookups for large tables All of the above None of the above 3. HBase (mark all that apply) Provides fast lookups for large tables Enables Random Access Enables low latency access to singe rows from billions…

Read More >

How to use Talend with Dimelo API ?

How to use Talend with Dimelo API ? TalendExpert.com can help you on your process of Ingesting your data from Dimelo using Talend ETL. Our teams have developed and delivered successfully this integration journey, we have developed many template jobs reusable and ready for use regarding Dimelo integration. Also, we do propose ready-to-use components that extract/update data with Dimelo API using a very simple connectors in Talend , no coding skills needed to integrate your data using APIs. These components are not in Exchange community and on-demand use only….

Read More >

Tables saved with the Spark SQL DataFrame.saveAsTable method are not compatible with Hive

Writing a DataFrame directly to a Hive table creates a table that is not compatible with Hive; the metadata stored in the metastore can only be correctly interpreted by Spark. For example: val hsc = new HiveContext(sc) import hsc.implicits._ val val df = sc.parallelize(data).toDF() df.write.format(“parquet”).saveAsTable(tableName) creates a table with this metadata: inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat This is also occurs when using explicit schema, such as: val schema = StructType(Seq(…)) val data = sc.parallelize(Seq(Row(…), …)) val df = hsc.createDataFrame(data, schema) df.write.format(“parquet”).saveAsTable(tableName) Workaround:…

Read More >