HookHBase.md 5.08 KB
Newer Older
1 2
---
name: HBase
3
route: /HookHBase
4 5 6
menu: Documentation
submenu: Hooks
---
7

8 9 10 11 12 13 14
import  themen  from 'theme/styles/styled-colors';
import  * as theme  from 'react-syntax-highlighter/dist/esm/styles/hljs';
import SyntaxHighlighter from 'react-syntax-highlighter';

# Apache Atlas Hook & Bridge for Apache HBase

## HBase Model
15 16 17 18 19 20 21 22 23 24 25 26 27 28
HBase model includes the following types:
   * Entity types:
      * hbase_namespace
         * super-types: !Asset
         * attributes: qualifiedName, name, description, owner, clusterName, parameters, createTime, modifiedTime
      * hbase_table
         * super-types: !DataSet
         * attributes: qualifiedName, name, description, owner, namespace, column_families, uri, parameters, createtime, modifiedtime, maxfilesize, isReadOnly, isCompactionEnabled, isNormalizationEnabled, ReplicaPerRegion, Durability
      * hbase_column_family
         * super-types: !DataSet
         * attributes:  qualifiedName, name, description, owner, columns, createTime, bloomFilterType, compressionType, compactionCompressionType, encryptionType, inMemoryCompactionPolicy, keepDeletedCells, maxversions, minVersions, datablockEncoding, storagePolicy, ttl, blockCachedEnabled, cacheBloomsOnWrite, cacheDataOnWrite, evictBlocksOnClose, prefetchBlocksOnOpen, newVersionsBehavior, isMobEnabled, mobCompactPartitionPolicy

HBase entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below. Note that namespaceName, tableName and columnFamilyName should be in lower case.

29 30 31 32 33
<SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}>
{`hbase_namespace.qualifiedName:      <namespaceName>@<clusterName>
hbase_table.qualifiedName:          <namespaceName>:<tableName>@<clusterName>
hbase_column_family.qualifiedName:  <namespaceName>:<tableName>.<columnFamilyName>@<clusterName>`}
</SyntaxHighlighter>
34

35 36

## HBase Hook
37 38 39
Atlas HBase hook registers with HBase master as a co-processor. On detecting changes to HBase namespaces/tables/column-families, Atlas hook updates the metadata in Atlas via Kafka notifications.
Follow the instructions below to setup Atlas hook in HBase:
   * Register Atlas hook in hbase-site.xml by adding the following:
40 41 42 43 44 45 46 47
  
<SyntaxHighlighter wrapLines={true} language="xml" style={theme.dark}>
{`<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
</property>`}
</SyntaxHighlighter>

48 49
   * untar apache-atlas-${project.version}-hbase-hook.tar.gz
   * cd apache-atlas-hbase-hook-${project.version}
50 51 52
   * Copy entire contents of folder apache-atlas-hbase-hook-${project.version}/hook/hbase to `<atlas package>`/hook/hbase
   * Link Atlas hook jars in HBase classpath - 'ln -s `<atlas package>`/hook/hbase/* `<hbase-home>`/lib/'
   * Copy `<atlas-conf>`/atlas-application.properties to the HBase conf directory.
53 54

The following properties in atlas-application.properties control the thread pool and notification details:
55 56 57

<SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}>
{`atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in HBase operations. Default: false
58 59 60 61 62 63
atlas.hook.hbase.numRetries=3      # number of retries for notification failure. Default: 3
atlas.hook.hbase.queueSize=10000   # queue size for the threadpool. Default: 10000
atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
atlas.kafka.zookeeper.connect=                    # Zookeeper connect URL for Kafka. Example: localhost:2181
atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000
atlas.kafka.zookeeper.session.timeout.ms=60000    # Zookeeper session timeout. Default: 60000
64 65
atlas.kafka.zookeeper.sync.time.ms=20             # Zookeeper sync time. Default: 20`}
</SyntaxHighlighter>
66 67

Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.".
68
For list of configuration supported by Kafka producer, please refer to [Kafka Producer Configs](http://kafka.apache.org/documentation/#producerconfigs)
69

70
## NOTES
71 72 73
   * Only the namespace, table and column-family create/update/ delete operations are captured by Atlas HBase hook. Changes to columns are be captured.


74
## Importing HBase Metadata
75 76 77 78
Apache Atlas provides a command-line utility, import-hbase.sh, to import metadata of Apache HBase namespaces and tables into Apache Atlas.
This utility can be used to initialize Apache Atlas with namespaces/tables present in a Apache HBase cluster.
This utility supports importing metadata of a specific table, tables in a specific namespace or all tables.

79 80
<SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}>
{`Usage 1: <atlas package>/hook-bin/import-hbase.sh
81 82 83 84 85
Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex>] [-t <table regex> OR --table <table regex>]
Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>]
           File Format:
             namespace1:tbl1
             namespace1:tbl2
86 87
             namespace2:tbl1`}
</SyntaxHighlighter>