HDFS Cache
Impala의 쿼리 성능을 높이기 위해서 HDFS Cache 기능을 사용할 수 있으며 멀티 캐쉬를 사용할 수도 있습니다. 우선 다음과 같이 HDFS Cache를 생성합니다.
# hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
테이블 생성시 또는 테이블 생성 후에 테이블에 캐쉬를 지정할 수 있습니다.
-- Cache the entire table (all partitions). alter table census set cached in 'pool_name'; -- Remove the entire table from the cache. alter table census set uncached; -- Cache a portion of the table (a single partition). -- If the table is partitioned by multiple columns (such as year, month, day), -- the ALTER TABLE command must specify values for all those columns. alter table census partition (year=1960) set cached in 'pool_name'; -- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any -- single host when the same data block is processed multiple times. alter table census partition (year=1970) set cached in 'pool_name' with replication = 4; -- At each stage, check the volume of cached data. -- For large tables or partitions, the background loading might take some time, -- so you might have to wait and reissue the statement until all the data -- has finished being loaded into the cache. show table stats census; +-------+-------+--------+------+--------------+--------+ | year | #Rows | #Files | Size | Bytes Cached | Format | +-------+-------+--------+------+--------------+--------+ | 1900 | -1 | 1 | 11B | NOT CACHED | TEXT | | 1940 | -1 | 1 | 11B | NOT CACHED | TEXT | | 1960 | -1 | 1 | 11B | 11B | TEXT | | 1970 | -1 | 1 | 11B | NOT CACHED | TEXT | | Total | -1 | 4 | 44B | 11B | | +-------+-------+--------+------+--------------+--------+
보다 상세한 사항은 HDFS Caching with Impala을 참고하십시오.
Data Cache for Remote Read
Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve) 필드에 다음을 추가합니다. quota는 1TB 형식으로 입력합니다.
--data_cache=dir1,dir2,dir3,...:quota
보다 상세한 사항은 Data Cache for Remote Reads을 참고하십시오.