Talend España

Talend España : Madrid – Alicante – Barcelona – Valencia   Talend Expert es el especialista de la aplicación de la integración de datos, grandes volúmenes de datos y proyectos de gestión de datos maestros utilizando Talend. Somos una empresa de consultoría de Talend la confianza de las empresas líderes en el mundo. Una empresa de […]

Read More »

Tutorial : Load Parquet files using Talend and Impala

Impala helps you to create, manage, and query Parquet tables. Parquet is a column-oriented binary file format intended to be highly efficient for the types of large-scale queries that Impala is best at. Parquet is especially good for queries scanning particular columns within a table, for example to query “wide” tables with many columns, or […]

Read More »

Talend Spark Parquet error

Symptoms :   [WARN ]: org.apache.spark.scheduler.TaskSetManager – Lost task 0.0 in stage 6.0 (TID 14, clouderadXXXXX): java.lang.NullPointerException at parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:161) at parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:204) at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:376) at parquet.example.data.simple.BinaryValue.writeValue(BinaryValue.java:45) at parquet.example.data.simple.SimpleGroup.writeValue(SimpleGroup.java:229) at parquet.example.data.GroupWriter.writeGroup(GroupWriter.java:51) at parquet.example.data.GroupWriter.write(GroupWriter.java:37) at parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:74) at parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:36) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.write(DeprecatedParquetOutputFormat.java:107) at parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.write(DeprecatedParquetOutputFormat.java:75) at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1073) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) […]

Read More »