• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Maven
  • About
Big Data | Manage Hive Databases
  1. References
  2. Internal (managed) Tables vs External Tables
  3. Supported file formats
  4. Create a database (using external tables)

  1. References
    See these pages for more details about Apache Hive:
    https://cwiki.apache.org/confluence/display/Hive/GettingStarted
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
  2. Internal (managed) Tables vs External Tables
    See: https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables

    • Internal (managed) Table: Hive manage both the metadata and the data of the table.
      If a managed table is dropped then the table data and its metadata are deleted.
      The data of a managed table (for a specific database DATABASE-NAME.db) is stored under the hive.metastore.warehouse.dir directory (default: /hive/warehouse/).

      CREATE TABLE test_internal_table (field1 INT, field2 STRING);

    • External Table: Hive manage only the metadata of the table while the data is located into a location specified when the table is created.
      If an external table is dropped then only the table metadata is deleted.

      CREATE EXTERNAL TABLE test_external_table (field1 INT, field2 STRING) LOCATION 'hdfs://localhost:8020/test_db/test_external_table';

    You can use the command DESCRIBE FORMATTED table_name to identify if a table is an internal (managed) or an external table.
    The command displays MANAGED_TABLE for internal table type and EXTERNAL_TABLE for external table type.

    Describe internal tables:
    hive> describe formatted test_internal_table;
    OK
    # col_name                data_type               comment
    field1                  int
    field2                  string
    
    # Detailed Table Information
    Database:               default
    OwnerType:              USER
    Owner:                  hive
    CreateTime:             Sun Dec 30 10:58:22 EDT 2018
    LastAccessTime:         UNKNOWN
    Retention:              0
    Location:               hdfs://localhost:8020/hive/warehouse/test_internal_table
    Table Type:             MANAGED_TABLE
    Table Parameters:
        COLUMN_STATS_ACCURATE    {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"field1\":\"true\",\"field2\":\"true\"}}
        bucketing_version       2
        numFiles                0
        numRows                 0
        rawDataSize             0
        totalSize               0
        transient_lastDdlTime    1559487502
    
    # Storage Information
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed:             No
    Num Buckets:            -1
    Bucket Columns:         []
    Sort Columns:           []
    Storage Desc Params:
        serialization.format    1
    Time taken: 0.084 seconds, Fetched: 32 row(s)

    Describe external tables:
    hive> describe formatted test_external_table;
    OK
    # col_name                data_type               comment
    field1                  int
    field2                  string
    
    # Detailed Table Information
    Database:               test
    OwnerType:              USER
    Owner:                  hive
    CreateTime:             Sun Dec 30 10:58:22 EDT 2018
    LastAccessTime:         UNKNOWN
    Retention:              0
    Location:               hdfs://localhost:8020/test_external_tables
    Table Type:             EXTERNAL_TABLE
    Table Parameters:
        EXTERNAL                TRUE
        bucketing_version       2
        numFiles                0
        totalSize               0
        transient_lastDdlTime    1559487888
    
    # Storage Information
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed:             No
    Num Buckets:            -1
    Bucket Columns:         []
    Sort Columns:           []
    Storage Desc Params:
        serialization.format    1
    Time taken: 0.045 seconds, Fetched: 30 row(s)
  3. Supported file formats
    See these pages for more details about supported file formats:
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

    Create Table statement:
    CREATE TABLE table_name
      [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
       [STORED AS file_format]
      ]

    Supported file formats:
    TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
    ORC         -- (Note: Available in Hive 0.11.0 and later)
    PARQUET     -- (Note: Available in Hive 0.13.0 and later)
    AVRO        -- (Note: Available in Hive 0.14.0 and later)
    JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
    RCFILE      -- (Note: Available in Hive 0.6.0 and later)

    Example: ORC Table
    CREATE EXTERNAL TABLE test_external_table (field1 INT, field2 STRING) STORED AS ORC;
  4. Create a database (using external tables)
    Create HDFS location:
    $ su - hadoop
    $ hdfs dfs -mkdir hdfs://localhost:8020/test_db
    $ hdfs dfs -chown hive hdfs://localhost:8020/test_db

    Switch to "hive" user:
    $ su - hive

    Create Hive database:
    $ hive
    > create database test_db;
    > exit;

    Define Hive tables:
    $ vi ~/test_db.hql
    CREATE EXTERNAL TABLE test_table_1 (field1 INT, field2 STRING) STORED AS ORC location 'hdfs://localhost:8020/test_db/test_table_1';
    CREATE EXTERNAL TABLE test_table_2 (field1 INT, field2 STRING) STORED AS ORC location 'hdfs://localhost:8020/test_db/test_table_2';

    Create Hive tables (using hive command line):
    $ hive --database test_db -f ~/test_db.hql

    Create Hive tables (using beeline command line):
    $ beeline
    Beeline version 3.1.2 by Apache Hive
    
    beeline> !connect jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: hive
    Enter password for jdbc:hive2://localhost:10000: ****
    
    0: jdbc:hive2://localhost:10000> create database test;
    
    0: jdbc:hive2://localhost:10000> use test;
    
    0: jdbc:hive2://localhost:10000> !run /home/hive/test_db.hql
    >>>  CREATE EXTERNAL TABLE test_table_1 (field1 INT, field2 STRING) STORED AS ORC;
    >>>  CREATE EXTERNAL TABLE test_table_2 (field1 INT, field2 STRING) STORED AS ORC;
    
    0: jdbc:hive2://localhost:10000> show tables;
    +----------------+
    |    tab_name    |
    +----------------+
    | test_table_1  |
    | test_table_2  |
    +----------------+

    Validate Hive database and tables:
    $ hdfs dfs -ls hdfs://localhost:8020/hive/warehouse/test_db.db
    $ hdfs dfs -ls hdfs://localhost:8020/test_db

    Drop Hive database (metadata):
    $ hive
    > drop database test_db cascade;
    > exit;

    Drop Hive database (data):
    #$ hdfs dfs -rm -r hdfs://localhost:8020/hive/warehouse/test_db.db -- will be deleted when you drop the database
    $ hdfs dfs -rm -r hdfs://localhost:8020/test_db
© 2025  mtitek