郁闷，hive分区表怎么就分不好了，求大神解释

你的位置：网站首页 >> 频道首页 >>操作系统 >>郁闷，hive分区表怎么就分不好了，求大神解释

郁闷，hive分区表怎么就分不好了，求大神解释

来源：蜘蛛抓取(WebSpider) 时间：2017-03-07 09:10 标签： hive 删除分区

hive按当天日期建立分区表 | 动态往日期分区插入数据
hive建立分区表，以当天日期（“”）作为分区依据，hql如下：
CREATE EXTERNAL TABLE IF NOT EXISTS product_sell(
category_id BIGINT,
province_id BIGINT,
product_id BIGINT,
price DOUBLE,
sell_num BIGINT
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
然后以日期作为分区依据，插入数据，shell脚本如下：
#!/bin/bash
source /etc/
today=$(date +%Y-%m-%d)
/usr/local/cloud/hive/bin/hive<<EOF
INSERT OVERWRITE TABLE product_sell PARTITION (ds='$today') select a.category_id, b.good_receiver_province_id as province_id, a.id as product_id, (b.sell_amount/b.sell_num) as price, b.sell_num from product a join (select si.product_id, s.good_receiver_province_id, sum(si.order_item_amount) sell_amount, sum(si.order_item_num) sell_num from so_item si join so s on (si.order_id=s.id) where si.is_gift=0 and datediff(date_sub(from_unixtime(unix_timestamp()),1), si.ds) between 0 and 6 group by s.good_receiver_province_id, si.product_id) b on (a.id=b.product_id);
这个部分难点在于hive中不知道如何调用形如shell日期变量的方式创建日期分区，mark一下！hive动态分区遇到的一个错误 - 博客搬家至
- ITeye技术网站
博客分类：
insert overwrite table in_yuncheng_tbshelf partition (pt)
select userid, bookid, bookname, createts, rpid, addts, updatets, isdel, rcid, category_type, wapbookmarks, addmarkts, readingchapterid, readpercentage, readingts,
substring(addts,0,10) as pt from search_product.yuncheng_tbshelf where pt&=''
[Fatal Error] Operator FS_3 (id=3): Number of dynamic partitions exceeded hive.exec.max.dynamic.partitions.pernode.. Killing the job.
hive.exec.max.dynamic.partitions.pernode （缺省值100）：
每一个mapreduce job允许创建的分区的最大数量，如果超过了这个数量就会报错
hive.exec.max.dynamic.partitions （缺省值1000）：一个dml语句允许创建的所有分区的最大数量
hive.exec.max.created.files （缺省值100000）：所有的mapreduce job允许创建的文件的最大数量
当源表数据量很大时，单独一个mapreduce job中生成的数据在分区列上可能很分散，举个简单的例子，比如下面的表要用3个map：
如果数据这样分布，那每个mapreduce只需要创建1个分区就可以了：
map1 --& |1
map2 --& |2
map3 --& |3
但是如果数据按下面这样分布，那第一个mapreduce就要创建3个分区：
map2 --& |2
map3 --& |2
为了让分区列的值相同的数据尽量在同一个mapreduce中，
这样每一个mapreduce可以尽量少的产生新的文件夹，可以借助distribute by的功能，将分区列值相同的数据放到一起：
insert overwrite table in_yuncheng_tbshelf partition (pt)
select userid, bookid, bookname, createts, rpid, addts, updatets, isdel, rcid, category_type, wapbookmarks, addmarkts, readingchapterid, readpercentage, readingts,
substring(addts,0,10) as pt from search_product.yuncheng_tbshelf where pt&=''
distribute by substring(addts,0,10)
另外，调大hive.exec.max.dynamic.partitions.pernode参数的值不知道是否可行，还没试。
superlxw1234
浏览: 235776 次
来自: 西安
浏览量：16541
public void readFields(D ...
在spark-sql中通过insert插入数据到HBase表时 ...
好好学习，天天向上！
好好学习，天天向上！
好好学习，天天向上！侯爷有话说 | Hive之分区（Partitions）和桶（Buckets）

郁闷，hive分区表怎么就分不好了，求大神解释

我要回帖

更多关于 hive 删除分区的文章

随机推荐

郁闷，hive分区表怎么就分不好了，求大神解释

我要回帖

更多关于 hive 删除分区 的文章

随机推荐

更多关于 hive 删除分区的文章