---+ Hive DGI Bridge Hive metadata can be modelled in DGI using its Type System. The default modelling is available in org.apache.hadoop.metadata.hive.model.HiveDataModelGenerator. It defines the following types: * hive_object_type(EnumType) - [GLOBAL, DATABASE, TABLE, PARTITION, COLUMN] * hive_resource_type(EnumType) - [JAR, FILE, ARCHIVE] * hive_principal_type(EnumType) - [USER, ROLE, GROUP] * hive_function_type(EnumType) - [JAVA] * hive_order(StructType) - [col, order] * hive_resourceuri(StructType) - [resourceType, uri] * hive_serde(StructType) - [name, serializationLib, parameters] * hive_process(ClassType) - [processName, startTime, endTime, userName, sourceTableNames, targetTableNames, queryText, queryPlan, queryId, queryGraph] * hive_function(ClassType) - [functionName, dbName, className, ownerName, ownerType, createTime, functionType, resourceUris] * hive_type(ClassType) - [name, type1, type2, fields] * hive_partition(ClassType) - [values, dbName, tableName, createTime, lastAccessTime, sd, parameters] * hive_storagedesc(ClassType) - [cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories] * hive_index(ClassType) - [indexName, indexHandlerClass, dbName, createTime, lastAccessTime, origTableName, indexTableName, sd, parameters, deferredRebuild] * hive_role(ClassType) - [roleName, createTime, ownerName] * hive_column(ClassType) - [name, type, comment] * hive_db(ClassType) - [name, description, locationUri, parameters, ownerName, ownerType] * hive_table(ClassType) - [tableName, dbName, owner, createTime, lastAccessTime, retention, sd, partitionKeys, parameters, viewOriginalText, viewExpandedText, tableType, temporary] ---++ Importing Hive Metadata org.apache.hadoop.metadata.hive.bridge.HiveMetaStoreBridge imports the hive metadata into DGI using the typesystem defined in org.apache.hadoop.metadata.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. Set-up the following configs in hive-site.xml of your hive set-up and set environment variable HIVE_CONFIG to the hive conf directory: * DGI endpoint - Add the following property with the DGI endpoint for your set-up <verbatim> <property> <name>hive.hook.dgi.url</name> <value>http://localhost:21000/</value> </property> </verbatim> Usage: <dgi package>/bin/import-hive.sh ---++ Hive Hook Hive supports listeners on hive command execution using hive hooks. This is used to add/update/remove entities in DGI using the model defined in org.apache.hadoop.metadata.hive.model.HiveDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. Follow the these instructions in your hive set-up to add hive hook for DGI: * Add org.apache.hadoop.metadata.hive.hook.HiveHook as post execution hook in hive-site.xml <verbatim> <property> <name>hive.exec.post.hooks</name> <value>org.apache.hadoop.metadata.hive.hook.HiveHook</value> </property> </verbatim> * Add the following property in hive-ste.xml with the DGI endpoint for your set-up <verbatim> <property> <name>hive.hook.dgi.url</name> <value>http://localhost:21000/</value> </property> </verbatim> * Add 'export HIVE_AUX_JARS_PATH=<dgi package>/hook/hive' in hive-env.sh The following properties in hive-site.xml control the thread pool details: * hive.hook.dgi.minThreads - core number of threads. default 5 * hive.hook.dgi.maxThreads - maximum number of threads. default 5 * hive.hook.dgi.keepAliveTime - keep alive time in msecs. default 10 * hive.hook.dgi.synchronous - boolean, true to run the hook synchronously. default false