“`” 解答:

Hive提供了三个虚拟列:

INPUT<strong>FILE</strong>NAME

BLOCK<strong>OFFSET</strong>INSIDE__FILE

ROW<strong>OFFSET</strong>INSIDE__BLOCK

但ROW<strong>OFFSET</strong>INSIDE__BLOCK默认是不可用的,需要设置hive.exec.rowoffset为true才可以。可以用来排查有问题的输入数据。

INPUT<strong>FILE</strong>NAME, mapper任务的输出文件名。

BLOCK<strong>OFFSET</strong>INSIDE__FILE, 当前全局文件的偏移量。对于块压缩文件,就是当前块的文件偏移量,即当前块的第一个字节在文件中的偏移量。

hive> SELECT INPUT<strong>FILE</strong>NAME, BLOCK<strong>OFFSET</strong>INSIDE__FILE, line

</p>

<blockquote>FROM hive_text WHERE line LIKE '%hive%' LIMIT 2;
</blockquote>

<p>

har://file/user/hive/warehouse/hive_text/folder=docs/

data.har/user/hive/warehouse/hive_text/folder=docs/README.txt 2243

har://file/user/hive/warehouse/hive_text/folder=docs/

data.har/user/hive/warehouse/hive_text/folder=docs/README.txt 3646

<pre><code> "“`

Was this helpful?

0 / 0

发表回复 0

Your email address will not be published.