Sunday, September 14, 2014

SQOOP import export from SQL Server

While exporting data from hdfs to SQL Server using Sqoop ran into a problem with sqoop not reading data properly and as a result in export erroring out.

Command I was using:
sqoop export --connect "jdbc:sqlserver://192.168.84.1:1433;database=myDB;username=dbuser;password=*******" --table TransactionsFromHadoop --export-dir /apps/hive/warehouse/transactions


This resulted in foll. error:
Caused by: java.lang.NumberFormatException: For input string: "10007134166102007-09-01 00:00:00.0W2200.00002007-09-01 00:00:00.0"

The data I was trying to export had originally been imported from SQL Server to hadoop using sqoop and while importing sqoop uses a default column delimited of ^A (octal representation of  '\001' ). So to get this to work I had to specify the delimiter using -input-fields-terminated-by

sqoop export --connect "jdbc:sqlserver://192.168.84.1:1433;database=myDB;username=dbuser;password=*******" --table TransactionsFromHadoop --export-dir /apps/hive/warehouse/transactions --input-fields-terminated-by ^A --lines-terminated-by '\n'
OR

sqoop export --connect "jdbc:sqlserver://192.168.84.1:1433;database=myDB;username=dbuser;password=*******" --table TransactionsFromHadoop --export-dir /apps/hive/warehouse/transactions --input-fields-terminated-by '\001' --lines-terminated-by '\n'


References:
Import from Microsoft SQL Server into the Hortonworks Sandbox using Sqoop
Sqoop User Guide