sqoop 同步sql server数据到hive

  1. 把驱动sqljdbc4-3.0.jar放到目录 /var/lib/sqoop
  2. 语句如下:
     sqoop import --connect 'jdbc:sqlserver://;database=Reports;username=xxxx;password=xxxx' \ --query " select t1.*,convert(varchar(10),paid_time,23) as date from orders t1 where convert(varchar(10),paid_time,23) 
    between '2016-06-01' and '2016-06-03' and \$CONDITIONS " \ --fields-terminated-by "\001" --lines-terminated-by "\n" \ --m 4 
    --split-by store_nbr  --target-dir /ecommerce/orders_tmp --delete-target-dir --null-non-string 0 --hive-drop-import-delims \ 
    --hive-import --hive-overwrite --create-hive-table --hive-database fresh --hive-table orders_tmp 
  3. –m 4 参数: 分几个map任务运行, 一般小于或等于datanode的个数 
  4. –split-by store_nbr  参数: 按什么字段来拆分map任务, 要求为int类型, 一般来说数值越连续的字段越好, 这样拆分到各个task才比较均匀.

Be the first to comment

Leave a Reply

Your email address will not be published.