郁闷,hive分区表怎么就分不好了,求大神解释

hive按当天日期建立分区表 | 动态往日期分区插入数据
hive建立分区表,以当天日期(“”)作为分区依据,hql如下:
CREATE EXTERNAL TABLE IF NOT EXISTS product_sell(
category_id BIGINT,
province_id BIGINT,
product_id BIGINT,
price DOUBLE,
sell_num BIGINT
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
然后以日期作为分区依据,插入数据,shell脚本如下:
#!/bin/bash
source /etc/
today=$(date +%Y-%m-%d)
/usr/local/cloud/hive/bin/hive<<EOF
INSERT OVERWRITE TABLE product_sell PARTITION (ds=&#39;$today&#39;) select a.category_id, b.good_receiver_province_id as province_id, a.id as product_id, (b.sell_amount/b.sell_num) as price, b.sell_num from product a join (select si.product_id, s.good_receiver_province_id, sum(si.order_item_amount) sell_amount, sum(si.order_item_num) sell_num from so_item si join so s on (si.order_id=s.id) where si.is_gift=0 and datediff(date_sub(from_unixtime(unix_timestamp()),1), si.ds) between 0 and 6 group by s.good_receiver_province_id, si.product_id) b on (a.id=b.product_id);
这个部分难点在于hive中不知道如何调用形如shell日期变量的方式创建日期分区,mark一下!hive动态分区遇到的一个错误 - 博客搬家至
- ITeye技术网站
博客分类:
insert overwrite table in_yuncheng_tbshelf partition (pt)
select userid, bookid, bookname, createts, rpid, addts, updatets, isdel, rcid, category_type, wapbookmarks, addmarkts, readingchapterid, readpercentage, readingts,
substring(addts,0,10) as pt from search_product.yuncheng_tbshelf where pt&=''
[Fatal Error] Operator FS_3 (id=3): Number of dynamic partitions exceeded hive.exec.max.dynamic.partitions.pernode.. Killing the job.
hive.exec.max.dynamic.partitions.pernode (缺省值100):
每一个mapreduce job允许创建的分区的最大数量,如果超过了这个数量就会报错
hive.exec.max.dynamic.partitions (缺省值1000):一个dml语句允许创建的所有分区的最大数量
hive.exec.max.created.files (缺省值100000):所有的mapreduce job允许创建的文件的最大数量
当源表数据量很大时,单独一个mapreduce job中生成的数据在分区列上可能很分散,举个简单的例子,比如下面的表要用3个map:
如果数据这样分布,那每个mapreduce只需要创建1个分区就可以了:
map1 --& |1
map2 --& |2
map3 --& |3
但是如果数据按下面这样分布,那第一个mapreduce就要创建3个分区:
map2 --& |2
map3 --& |2
为了让分区列的值相同的数据尽量在同一个mapreduce中,
这样每一个mapreduce可以尽量少的产生新的文件夹,可以借助distribute by的功能,将分区列值相同的数据放到一起:
insert overwrite table in_yuncheng_tbshelf partition (pt)
select userid, bookid, bookname, createts, rpid, addts, updatets, isdel, rcid, category_type, wapbookmarks, addmarkts, readingchapterid, readpercentage, readingts,
substring(addts,0,10) as pt from search_product.yuncheng_tbshelf where pt&=''
distribute by substring(addts,0,10)
另外,调大hive.exec.max.dynamic.partitions.pernode参数的值不知道是否可行,还没试。
superlxw1234
浏览: 235776 次
来自: 西安
浏览量:16541
public void readFields(D ...
在spark-sql中通过insert插入数据到HBase表时 ...
好好学习,天天向上!
好好学习,天天向上!
好好学习,天天向上!侯爷有话说 | Hive之分区(Partitions)和桶(Buckets)

我要回帖

更多关于 hive 删除分区 的文章

 

随机推荐