ATLAS-2647: updated documentation on notification, hooks and basic-search

880ea4b6 · Madhan Neethiraj · 1fc88ce3 · 1fc88ce3 · 1fc88ce3 · 1fc88ce3
Commit 880ea4b6 authored May 05, 2018 by Madhan Neethiraj
34 changed files
--- a/docs/src/site/resources/images/add.gif
+++ b/docs/src/site/resources/images/add.gif
--- a/docs/src/site/resources/images/apache-incubator-logo.png
+++ b/docs/src/site/resources/images/apache-incubator-logo.png
--- a/docs/src/site/resources/images/apache-maven-project-2.png
+++ b/docs/src/site/resources/images/apache-maven-project-2.png
--- a/docs/src/site/resources/images/fix.gif
+++ b/docs/src/site/resources/images/fix.gif
--- a/docs/src/site/resources/images/icon_error_sml.gif
+++ b/docs/src/site/resources/images/icon_error_sml.gif
--- a/docs/src/site/resources/images/icon_help_sml.gif
+++ b/docs/src/site/resources/images/icon_help_sml.gif
--- a/docs/src/site/resources/images/icon_info_sml.gif
+++ b/docs/src/site/resources/images/icon_info_sml.gif
--- a/docs/src/site/resources/images/icon_success_sml.gif
+++ b/docs/src/site/resources/images/icon_success_sml.gif
--- a/docs/src/site/resources/images/icon_warning_sml.gif
+++ b/docs/src/site/resources/images/icon_warning_sml.gif
--- a/docs/src/site/resources/images/logos/build-by-maven-black.png
+++ b/docs/src/site/resources/images/logos/build-by-maven-black.png
--- a/docs/src/site/resources/images/logos/build-by-maven-white.png
+++ b/docs/src/site/resources/images/logos/build-by-maven-white.png
--- a/docs/src/site/resources/images/profiles/pre-release.png
+++ b/docs/src/site/resources/images/profiles/pre-release.png
--- a/docs/src/site/resources/images/profiles/retired.png
+++ b/docs/src/site/resources/images/profiles/retired.png
--- a/docs/src/site/resources/images/profiles/sandbox.png
+++ b/docs/src/site/resources/images/profiles/sandbox.png
--- a/docs/src/site/resources/images/remove.gif
+++ b/docs/src/site/resources/images/remove.gif
--- a/docs/src/site/resources/images/rss.png
+++ b/docs/src/site/resources/images/rss.png
--- a/docs/src/site/resources/images/twiki/search-basic-hive_column-PII.png
+++ b/docs/src/site/resources/images/twiki/search-basic-hive_column-PII.png
--- a/docs/src/site/resources/images/twiki/search-basic-hive_table-customers-or-provider.png
+++ b/docs/src/site/resources/images/twiki/search-basic-hive_table-customers-or-provider.png
--- a/docs/src/site/resources/images/twiki/search-basic-hive_table-customers-owner_is_hive.png
+++ b/docs/src/site/resources/images/twiki/search-basic-hive_table-customers-owner_is_hive.png
--- a/docs/src/site/resources/images/twiki/search-basic-hive_table-customers.png
+++ b/docs/src/site/resources/images/twiki/search-basic-hive_table-customers.png
--- a/docs/src/site/resources/images/update.gif
+++ b/docs/src/site/resources/images/update.gif
--- a/docs/src/site/twiki/Architecture.twiki
+++ b/docs/src/site/twiki/Architecture.twiki
@@ -48,11 +48,11 @@ notification events. Events are written by the hooks and Atlas to different Kafk
 Atlas supports integration with many sources of metadata out of the box. More integrations will be added in future
 as well. Currently, Atlas supports ingesting and managing metadata from the following sources:

-   * [[Bridge-Hive][Hive]]
-   * [[Bridge-Sqoop][Sqoop]]
-   * [[Bridge-Falcon][Falcon]]
-   * [[StormAtlasHook][Storm]]
-   * HBase - _documentation work-in-progress_
+   * [[Hook-HBase][HBase]]
+   * [[Hook-Hive][Hive]]
+   * [[Hook-Sqoop][Sqoop]]
+   * [[Hook-Storm][Storm]]
+   * [[Bridge-Kafka][Kafka]]

 The integration implies two things:
 There are metadata models that Atlas defines natively to represent objects of these components.

--- a/docs/src/site/twiki/Bridge-HBase.twiki
+++ b/docs/src/site/twiki/Bridge-HBase.twiki
---+ HBase Atlas Bridge
-
---++ HBase Model
-The default HBase model includes the following types:
-   * Entity types:
-      * hbase_namespace
-         * super-types: !Asset
-         * attributes: name, owner, description, type, classifications, term, clustername, parameters, createtime, modifiedtime, qualifiedName
-      * hbase_table
-         * super-types: !DataSet
-         * attributes: name, owner, description, type, classifications, term, uri, column_families, namespace, parameters, createtime, modifiedtime, maxfilesize,
-                       isReadOnly, isCompactionEnabled, isNormalizationEnabled, ReplicaPerRegion, Durability, qualifiedName
-      * hbase_column_family
-        * super-types: !DataSet
-        * attributes:  name, owner, description, type, classifications, term, columnns, createtime, bloomFilterType, compressionType, CompactionCompressionType, EncryptionType,
-                       inMemoryCompactionPolicy, keepDeletedCells, Maxversions, MinVersions, datablockEncoding, storagePolicy, Ttl, blockCachedEnabled, cacheBloomsOnWrite,
-                       cacheDataOnWrite, EvictBlocksOnClose, PerfectBlocksOnOpen, NewVersionsBehavior, isMobEnbaled, MobCompactPartitionPolicy, qualifiedName
-
-The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:
-   * hbase_namespace.qualifiedName      - <namespace>@<clusterName>
-   * hbase_table.qualifiedName          - <namespace>:<tableName>@<clusterName>
-   * hbase_column_family.qualifiedName  - <namespace>:<tableName>.<columnFamily>@<clusterName>
-
-
---++ Importing HBase Metadata
-org.apache.atlas.hbase.bridge.HBaseBridge imports the HBase metadata into Atlas using the model defined above. import-hbase.sh command can be used to facilitate this.
-   <verbatim>
-   Usage 1: <atlas package>/hook-bin/import-hbase.sh
-   Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex >] [-t <table regex > OR --table <table regex>]
-   Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>]
-           File Format:
-           namespace1:tbl1
-           namespace1:tbl2
-           namespace2:tbl1
-   </verbatim>
-
-The logs are in <atlas package>/logs/import-hbase.log
-
---++ HBase Hook
-Atlas HBase hook registers with HBase to listen for create/update/delete operations and updates the metadata in Atlas, via Kafka notifications, for the changes in HBase.
-Follow the instructions below to setup Atlas hook in HBase:
-   * Set-up Atlas hook in hbase-site.xml by adding the following:
-  <verbatim>
-    <property>
-      <name>hbase.coprocessor.master.classes</name>
-      <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
-    </property></verbatim>
-   * Copy <atlas package>/hook/hbase/<All files and folder> to hbase class path. HBase hook binary files are present in apache-atlas-<release-vesion>-SNAPSHOT-hbase-hook.tar.gz
-   * Copy <atlas-conf>/atlas-application.properties to the hbase conf directory.
-
-The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
-   * atlas.hook.hbase.synchronous   - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in Hbase operation.
-   * atlas.hook.hbase.numRetries    - number of retries for notification failure. default 3
-   * atlas.hook.hbase.minThreads    - core number of threads. default 1
-   * atlas.hook.hbase.maxThreads    - maximum number of threads. default 5
-   * atlas.hook.hbase.keepAliveTime - keep alive time in msecs. default 10
-   * atlas.hook.hbase.queueSize     - queue size for the threadpool. default 10000
-
-Refer [[Configuration][Configuration]] for notification related configurations
-
---++ NOTES
-   * Only the namespace, table and columnfamily create / update / delete operations are caputured by the hook. Columns changes wont be captured and propagated.
\ No newline at end of file
--- a/docs/src/site/twiki/Bridge-Kafka.twiki
+++ b/docs/src/site/twiki/Bridge-Kafka.twiki
---+ Kafka Atlas Bridge
+---+ Apache Atlas Hook for Apache Kafka

 ---++ Kafka Model
-The default Kafka model includes the following types:
+Kafka model includes the following types:
   * Entity types:
      * kafka_topic
         * super-types: !DataSet
-         * attributes: name, owner, description, type, classifications, term, clustername, topic , partitionCount, qualifiedName
+         * attributes: qualifiedName, name, description, owner, topic, uri, partitionCount

-The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:
-   * topic.qualifiedName      - <topic>@<clusterName>
+Kafka entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below.
+Note that qualifiedName will have topic name in lower case.
+<verbatim>
+   topic.qualifiedName: <topic>@<clusterName>
+</verbatim>


 ---++ Setup
-      binary files are present in apache-atlas-<release-vesion>-SNAPSHOT-kafka-hook.tar.gz
-      Copy apache-atlas-kafka-hook-<release-verion>-SNAPSHOT/hook/kafka folder to <atlas package>/hook/    directory
-      Copy apache-atlas-kafka-hook-<release-verion>-SNAPSHOT/hook-bin folder to  <atlas package>/hook-bin/ directory
+      Binary files are present in apache-atlas-<release-version>-kafka-hook.tar.gz
+
+      Copy apache-atlas-kafka-hook-<release-version>/hook/kafka folder to <atlas package>/hook/    directory
+
+      Copy apache-atlas-kafka-hook-<release-version>/hook-bin folder to  <atlas package>/hook-bin directory

-    * Copy <atlas-conf>/atlas-application.properties to the Kafka conf directory.
 ---++ Importing Kafka Metadata
-org.apache.atlas.Kafka.bridge.KafkaBridge imports the Kafka metadata into Atlas using the model defined above. import-kafka.sh command can be used to facilitate this.
-   <verbatim>
-   Usage 1: <atlas package>/hook-bin/import-kafka.sh
-   Usage 2: <atlas package>/hook-bin/import-kafka.sh [-n <namespace regex> OR --namespace <namespace regex >] [-t <table regex > OR --table <table regex>]
-   Usage 3: <atlas package>/hook-bin/import-kafka.sh [-f <filename>]
-            File Format:
-                topic1
-                topic2
-                topic3
-   </verbatim>
-
-The logs are in <atlas package>/logs/import-kafka.log
-
-Refer [[Configuration][Configuration]] for notification related configurations
+Apache Atlas provides a command-line utility, import-kafka.sh, to import metadata of Apache Kafka topics into Apache Atlas.
+This utility can be used to initialize Apache Atlas with topics present in Apache Kafka.
+This utility supports importing metadata of a specific topic or all topics.
+
+<verbatim>
+Usage 1: <atlas package>/hook-bin/import-kafka.sh
+Usage 2: <atlas package>/hook-bin/import-kafka.sh [-t <topic prefix> OR --topic <topic prefix>]
+Usage 3: <atlas package>/hook-bin/import-kafka.sh [-f <filename>]
+         File Format:
+            topic1
+            topic2
+            topic3
+</verbatim>
--- a/docs/src/site/twiki/Bridge-Sqoop.twiki
+++ b/docs/src/site/twiki/Bridge-Sqoop.twiki
---+ Sqoop Atlas Bridge
-
---++ Sqoop Model
-The default hive model includes the following types:
-   * Entity types:
-      * sqoop_process
-         * super-types: Process
-         * attributes: name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName
-      * sqoop_dbdatastore
-         * super-types: !DataSet
-         * attributes: name, dbStoreType, storeUse, storeUri, source, description, ownerName
-
-   * Enum types:
-      * sqoop_operation_type
-         * values: IMPORT, EXPORT, EVAL
-      * sqoop_dbstore_usage
-         * values: TABLE, QUERY, PROCEDURE, OTHER
-
-The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:
-   * sqoop_process.qualifiedName     - dbStoreType-storeUri-endTime
-   * sqoop_dbdatastore.qualifiedName - dbStoreType-storeUri-source
-
---++ Sqoop Hook
-Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in !SqoopHook.
-This is used to add entities in Atlas using the model detailed above.
-
-Follow the instructions below to setup Atlas hook in Hive:
-
-Add the following properties to  to enable Atlas hook in Sqoop:
-   * Set-up Atlas hook in <sqoop-conf>/sqoop-site.xml by adding the following:
-  <verbatim>
-   <property>
-     <name>sqoop.job.data.publish.class</name>
-     <value>org.apache.atlas.sqoop.hook.SqoopHook</value>
-   </property></verbatim>
-   * Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
-   * Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
-
-Refer [[Configuration][Configuration]] for notification related configurations
-
---++ NOTES
-   * Only the following sqoop operations are captured by sqoop hook currently - hiveImport
--- a/docs/src/site/twiki/Bridge-Falcon.twiki
+++ b/docs/src/site/twiki/Bridge-Falcon.twiki
--- a/docs/src/site/twiki/Hook-HBase.twiki
+++ b/docs/src/site/twiki/Hook-HBase.twiki
+---+ Apache Atlas Hook & Bridge for Apache HBase
+
+---++ HBase Model
+HBase model includes the following types:
+   * Entity types:
+      * hbase_namespace
+         * super-types: !Asset
+         * attributes: qualifiedName, name, description, owner, clusterName, parameters, createTime, modifiedTime
+      * hbase_table
+         * super-types: !DataSet
+         * attributes: qualifiedName, name, description, owner, namespace, column_families, uri, parameters, createtime, modifiedtime, maxfilesize, isReadOnly, isCompactionEnabled, isNormalizationEnabled, ReplicaPerRegion, Durability
+      * hbase_column_family
+         * super-types: !DataSet
+         * attributes:  qualifiedName, name, description, owner, columns, createTime, bloomFilterType, compressionType, compactionCompressionType, encryptionType, inMemoryCompactionPolicy, keepDeletedCells, maxversions, minVersions, datablockEncoding, storagePolicy, ttl, blockCachedEnabled, cacheBloomsOnWrite, cacheDataOnWrite, evictBlocksOnClose, prefetchBlocksOnOpen, newVersionsBehavior, isMobEnabled, mobCompactPartitionPolicy
+
+HBase entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below. Note that namespaceName, tableName and columnFamilyName should be in lower case.
+<verbatim>
+   hbase_namespace.qualifiedName:      <namespaceName>@<clusterName>
+   hbase_table.qualifiedName:          <namespaceName>:<tableName>@<clusterName>
+   hbase_column_family.qualifiedName:  <namespaceName>:<tableName>.<columnFamilyName>@<clusterName>
+</verbatim>
+
+
+---++ HBase Hook
+Atlas HBase hook registers with HBase master as a co-processor. On detecting changes to HBase namespaces/tables/column-families, Atlas hook updates the metadata in Atlas via Kafka notifications.
+Follow the instructions below to setup Atlas hook in HBase:
+   * Register Atlas hook in hbase-site.xml by adding the following:
+  <verbatim>
+    <property>
+      <name>hbase.coprocessor.master.classes</name>
+      <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
+    </property></verbatim>
+   * Copy entire contents of folder <atlas package>/hook/hbase to HBase class path.
+   * Copy <atlas-conf>/atlas-application.properties to the HBase conf directory.
+
+The following properties in atlas-application.properties control the thread pool and notification details:
+<verbatim>
+atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in HBase operations. Default: false
+atlas.hook.hbase.numRetries=3      # number of retries for notification failure. Default: 3
+atlas.hook.hbase.queueSize=10000   # queue size for the threadpool. Default: 10000
+
+atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
+
+atlas.kafka.zookeeper.connect=                    # Zookeeper connect URL for Kafka. Example: localhost:2181
+atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000
+atlas.kafka.zookeeper.session.timeout.ms=60000    # Zookeeper session timeout. Default: 60000
+atlas.kafka.zookeeper.sync.time.ms=20             # Zookeeper sync time. Default: 20
+</verbatim>
+
+Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.".
+For list of configuration supported by Kafka producer, please refer to [[http://kafka.apache.org/documentation/#producerconfigs][Kafka Producer Configs]]
+
+---++ NOTES
+   * Only the namespace, table and column-family create/update/ delete operations are captured by Atlas HBase hook. Changes to columns are be captured.
+
+
+---++ Importing HBase Metadata
+Apache Atlas provides a command-line utility, import-hbase.sh, to import metadata of Apache HBase namespaces and tables into Apache Atlas.
+This utility can be used to initialize Apache Atlas with namespaces/tables present in a Apache HBase cluster.
+This utility supports importing metadata of a specific table, tables in a specific namespace or all tables.
+
+<verbatim>
+Usage 1: <atlas package>/hook-bin/import-hbase.sh
+Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex>] [-t <table regex> OR --table <table regex>]
+Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>]
+           File Format:
+             namespace1:tbl1
+             namespace1:tbl2
+             namespace2:tbl1
+</verbatim>
--- a/docs/src/site/twiki/Bridge-Hive.twiki
+++ b/docs/src/site/twiki/Bridge-Hive.twiki
---+ Hive Atlas Bridge
+---+ Apache Atlas Hook & Bridge for Apache Hive

 ---++ Hive Model
-The default hive model includes the following types:
+Hive model includes the following types:
   * Entity types:
      * hive_db
-         * super-types: Referenceable
-         * attributes: name, clusterName, description, locationUri, parameters, ownerName, ownerType
-      * hive_storagedesc
-         * super-types: Referenceable
-         * attributes: cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories
-      * hive_column
-         * super-types: Referenceable
-         * attributes: name, type, comment, table
+         * super-types: !Asset
+         * attributes: qualifiedName, name, description, owner, clusterName, location, parameters, ownerName
      * hive_table
         * super-types: !DataSet
-         * attributes: name, db, owner, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary
+         * attributes: qualifiedName, name, description, owner, db, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary
+      * hive_column
+         * super-types: !DataSet
+         * attributes: qualifiedName, name, description, owner, type, comment, table
+      * hive_storagedesc
+         * super-types: Referenceable
+         * attributes: qualifiedName, table, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories
      * hive_process
         * super-types: Process
-         * attributes: name, startTime, endTime, userName, operationType, queryText, queryPlan, queryId
+         * attributes: qualifiedName, name, description, owner, inputs, outputs, startTime, endTime, userName, operationType, queryText, queryPlan, queryId, clusterName
      * hive_column_lineage
         * super-types: Process
-         * attributes: query, depenendencyType, expression
+         * attributes: qualifiedName, name, description, owner, inputs, outputs, query, depenendencyType, expression

   * Enum types:
      * hive_principal_type
@@ -32,19 +32,13 @@ The default hive model includes the following types:
      * hive_serde
         * attributes: name, serializationLib, parameters

-The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that dbName, tableName and columnName should be in lower case. clusterName is explained below.
-   * hive_db.qualifiedName     - <dbName>@<clusterName>
-   * hive_table.qualifiedName  - <dbName>.<tableName>@<clusterName>
-   * hive_column.qualifiedName - <dbName>.<tableName>.<columnName>@<clusterName>
-   * hive_process.queryString  - trimmed query string in lower case
-
-
---++ Importing Hive Metadata
-org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata into Atlas using the model defined above. import-hive.sh command can be used to facilitate this.
-    <verbatim>
-    Usage: <atlas package>/hook-bin/import-hive.sh</verbatim>
-
-The logs are in <atlas package>/logs/import-hive.log
+Hive entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below. Note that dbName, tableName and columnName should be in lower case.
+<verbatim>
+   hive_db.qualifiedName:     <dbName>@<clusterName>
+   hive_table.qualifiedName:  <dbName>.<tableName>@<clusterName>
+   hive_column.qualifiedName: <dbName>.<tableName>.<columnName>@<clusterName>
+   hive_process.queryString:  trimmed query string in lower case
+</verbatim>


 ---++ Hive Hook
@@ -59,15 +53,21 @@ Follow the instructions below to setup Atlas hook in Hive:
   * Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh of your hive configuration
   * Copy <atlas-conf>/atlas-application.properties to the hive conf directory.

-The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
-   * atlas.hook.hive.synchronous   - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion.
-   * atlas.hook.hive.numRetries    - number of retries for notification failure. default 3
-   * atlas.hook.hive.minThreads    - core number of threads. default 1
-   * atlas.hook.hive.maxThreads    - maximum number of threads. default 5
-   * atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
-   * atlas.hook.hive.queueSize     - queue size for the threadpool. default 10000
+The following properties in atlas-application.properties control the thread pool and notification details:
+<verbatim>
+atlas.hook.hive.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in Hive query completion. Default: false
+atlas.hook.hive.numRetries=3      # number of retries for notification failure. Default: 3
+atlas.hook.hive.queueSize=10000   # queue size for the threadpool. Default: 10000
+
+atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
+
+atlas.kafka.zookeeper.connect=                    # Zookeeper connect URL for Kafka. Example: localhost:2181
+atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000
+atlas.kafka.zookeeper.session.timeout.ms=60000    # Zookeeper session timeout. Default: 60000
+atlas.kafka.zookeeper.sync.time.ms=20             # Zookeeper sync time. Default: 20
+</verbatim>

-Refer [[Configuration][Configuration]] for notification related configurations
+Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.". For list of configuration supported by Kafka producer, please refer to [[http://kafka.apache.org/documentation/#producerconfigs][Kafka Producer Configs]]

 ---++ Column Level Lineage

@@ -114,3 +114,19 @@ The lineage is captured as
      * alter database
      * alter table (skewed table information, stored as, protection is not supported)
      * alter view
+
+
+---++ Importing Hive Metadata
+Apache Atlas provides a command-line utility, import-hive.sh, to import metadata of Apache Hive databases and tables into Apache Atlas.
+This utility can be used to initialize Apache Atlas with databases/tables present in Apache Hive.
+This utility supports importing metadata of a specific table, tables in a specific database or all databases and tables.
+
+<verbatim>
+Usage 1: <atlas package>/hook-bin/import-hive.sh
+Usage 2: <atlas package>/hook-bin/import-hive.sh [-d <database regex> OR --database <database regex>] [-t <table regex> OR --table <table regex>]
+Usage 3: <atlas package>/hook-bin/import-hive.sh [-f <filename>]
+           File Format:
+             database1:tbl1
+             database1:tbl2
+             database2:tbl1
+</verbatim>
--- a/docs/src/site/twiki/Hook-Sqoop.twiki
+++ b/docs/src/site/twiki/Hook-Sqoop.twiki
+---+ Apache Atlas Hook for Apache Sqoop
+
+---++ Sqoop Model
+Sqoop model includes the following types:
+   * Entity types:
+      * sqoop_process
+         * super-types: Process
+         * attributes: qualifiedName, name, description, owner, inputs, outputs, operation, commandlineOpts, startTime, endTime, userName
+      * sqoop_dbdatastore
+         * super-types: !DataSet
+         * attributes: qualifiedName, name, description, owner, dbStoreType, storeUse, storeUri, source
+
+   * Enum types:
+      * sqoop_operation_type
+         * values: IMPORT, EXPORT, EVAL
+      * sqoop_dbstore_usage
+         * values: TABLE, QUERY, PROCEDURE, OTHER
+
+Sqoop entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below.
+<verbatim>
+   sqoop_process.qualifiedName:     sqoop <operation> --connect <url> {[--table <tableName>] || [--database <databaseName>]} [--query <storeQuery>]
+   sqoop_dbdatastore.qualifiedName: <storeType> --url <storeUri> {[--table <tableName>] || [--database <databaseName>]} [--query <storeQuery>]  --hive-<operation> --hive-database <databaseName> [--hive-table <tableName>] --hive-cluster <clusterName>
+</verbatim>
+
+---++ Sqoop Hook
+Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in !SqoopHook.
+This is used to add entities in Atlas using the model detailed above.
+
+Follow the instructions below to setup Atlas hook in Hive:
+
+Add the following properties to  to enable Atlas hook in Sqoop:
+   * Set-up Atlas hook in <sqoop-conf>/sqoop-site.xml by adding the following:
+  <verbatim>
+   <property>
+     <name>sqoop.job.data.publish.class</name>
+     <value>org.apache.atlas.sqoop.hook.SqoopHook</value>
+   </property></verbatim>
+   * Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
+   * Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
+
+
+The following properties in atlas-application.properties control the thread pool and notification details:
+<verbatim>
+atlas.hook.sqoop.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in Sqoop operation completion. Default: false
+atlas.hook.sqoop.numRetries=3      # number of retries for notification failure. Default: 3
+atlas.hook.sqoop.queueSize=10000   # queue size for the threadpool. Default: 10000
+
+atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
+
+atlas.kafka.zookeeper.connect=                    # Zookeeper connect URL for Kafka. Example: localhost:2181
+atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000
+atlas.kafka.zookeeper.session.timeout.ms=60000    # Zookeeper session timeout. Default: 60000
+atlas.kafka.zookeeper.sync.time.ms=20             # Zookeeper sync time. Default: 20
+</verbatim>
+
+Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.". For list of configuration supported by Kafka producer, please refer to [[http://kafka.apache.org/documentation/#producerconfigs][Kafka Producer Configs]]
+
+---++ NOTES
+   * Only the following sqoop operations are captured by sqoop hook currently
+      * hiveImport
--- a/docs/src/site/twiki/StormAtlasHook.twiki
+++ b/docs/src/site/twiki/StormAtlasHook.twiki
---+ Storm Atlas Bridge
+---+ Apache Atlas Hook for Apache Storm

 ---++ Introduction


--- a/docs/src/site/twiki/Notification-Entity.twiki
+++ b/docs/src/site/twiki/Notification-Entity.twiki
---+ Entity Change Notifications
-
-To receive Atlas entity notifications a consumer should be obtained through the notification interface.  Entity change notifications are sent every time a change is made to an entity.  Operations that result in an entity change notification are:
-   * <code>ENTITY_CREATE</code> - Create a new entity.
-   * <code>ENTITY_UPDATE</code> - Update an attribute of an existing entity.
-   * <code>TRAIT_ADD</code> - Add a trait to an entity.
-   * <code>TRAIT_DELETE</code> - Delete a trait from an entity.
-
- <verbatim>
-    // Obtain provider through injection…
-    Provider<NotificationInterface> provider;
-
-    // Get the notification interface
-    NotificationInterface notification = provider.get();
-
-    // Create consumers
-    List<NotificationConsumer<EntityNotification>> consumers =
-       notification.createConsumers(NotificationInterface.NotificationType.ENTITIES, 1);
-</verbatim>
-
-
-The consumer exposes the Iterator interface that should be used to get the entity notifications as they are posted.  The hasNext() method blocks until a notification is available.
-
-<verbatim>
-    while(consumer.hasNext()) {
-        EntityNotification notification = consumer.next();
-
-        IReferenceableInstance entity = notification.getEntity();
-        …
-    }
-</verbatim>
-
-
--- a/docs/src/site/twiki/Notifications.twiki
+++ b/docs/src/site/twiki/Notifications.twiki
+---+ Notifications
+
+---++ Notifications from Apache Atlas
+Apache Atlas sends notifications about metadata changes to Kafka topic named ATLAS_ENTITIES .
+Applications interested in metadata changes can monitor for these notifications.
+For example, Apache Ranger processes these notifications to authorize data access based on classifications.
+
+
+---+++ Notifications - V2: Apache Atlas version 1.0
+Apache Atlas 1.0 sends notifications for following operations on metadata.
+
+<verbatim>
+   ENTITY_CREATE:         sent when an entity instance is created
+   ENTITY_UPDATE:         sent when an entity instance is updated
+   ENTITY_DELETE:         sent when an entity instance is deleted
+   CLASSIFICATION_ADD:    sent when classifications are added to an entity instance
+   CLASSIFICATION_UPDATE: sent when classifications of an entity instance are updated
+   CLASSIFICATION_DELETE: sent when classifications are removed from an entity instance
+</verbatim>
+
+Notification includes the following data.
+<verbatim>
+   AtlasEntity               entity;
+   OperationType             operationType;
+   List<AtlasClassification> classifications;
+</verbatim>
+
+---+++ Notifications - V1: Apache Atlas version 0.8.x and earlier
+Notifications from Apache Atlas version 0.8.x and earlier have content formatted differently, as detailed below.
+
+__Operations__
+<verbatim>
+   ENTITY_CREATE: sent when an entity instance is created
+   ENTITY_UPDATE: sent when an entity instance is updated
+   ENTITY_DELETE: sent when an entity instance is deleted
+   TRAIT_ADD:     sent when classifications are added to an entity instance
+   TRAIT_UPDATE:  sent when classifications of an entity instance are updated
+   TRAIT_DELETE:  sent when classifications are removed from an entity instance
+</verbatim>
+
+Notification includes the following data.
+<verbatim>
+   Referenceable entity;
+   OperationType operationType;
+   List<Struct>  traits;
+</verbatim>
+
+Apache Atlas 1.0 can be configured to send notifications in older version format, instead of the latest version format.
+This can be helpful in deployments that are not yet ready to process notifications in latest version format.
+To configure Apache Atlas 1.0 to send notifications in earlier version format, please set following configuration in
+ atlas-application.properties:
+
+<verbatim>
+ atlas.notification.entity.version=v1
+</verbatim>
+
+---++ Notifications to Apache Atlas
+Apache Atlas can be notified of metadata changes and lineage via notifications to Kafka topic named ATLAS_HOOK.
+Atlas hooks for Apache Hive/Apache HBase/Apache Storm/Apache Sqoop use this mechanism to notify Apache Atlas of events of interest.
+
+<verbatim>
+ENTITY_CREATE            : create an entity. For more details, refer to Java class HookNotificationV1.EntityCreateRequest
+ENTITY_FULL_UPDATE       : update an entity. For more details, refer to Java class HookNotificationV1.EntityUpdateRequest
+ENTITY_PARTIAL_UPDATE    : update specific attributes of an entity. For more details, refer to HookNotificationV1.EntityPartialUpdateRequest
+ENTITY_DELETE            : delete an entity. For more details, refer to Java class HookNotificationV1.EntityDeleteRequest
+ENTITY_CREATE_V2         : create an entity. For more details, refer to Java class HookNotification.EntityCreateRequestV2
+ENTITY_FULL_UPDATE_V2    : update an entity. For more details, refer to Java class HookNotification.EntityUpdateRequestV2
+ENTITY_PARTIAL_UPDATE_V2 : update specific attributes of an entity. For more details, refer to HookNotification.EntityPartialUpdateRequestV2
+ENTITY_DELETE_V2         : delete one or more entities. For more details, refer to Java class HookNotification.EntityDeleteRequestV2
+</verbatim>
+
+
+
--- a/docs/src/site/twiki/Search-Basic.twiki
+++ b/docs/src/site/twiki/Search-Basic.twiki
@@ -7,114 +7,111 @@ The entire query structure can be represented using the following JSON structure

 <verbatim>
 {
-  "typeName": "hive_table",
+  "typeName":               "hive_column",
  "excludeDeletedEntities": true,
-  "classification" : "",
-  "query": "",
-  "limit": 25,
-  "offset": 0,
-  "entityFilters": {
-   "attributeName": "name",
-   "operator": "contains",
-   "attributeValue": "testtable"
-  },
-  "tagFilters": null,
-  "attributes": [""]
+  "classification":         "PII",
+  "query":                  "",
+  "offset":                 0,
+  "limit":                  25,
+  "entityFilters":          {  },
+  "tagFilters":             { },
+  "attributes":             [ "table", "qualifiedName"]
 }
 </verbatim>

 __Field description__

-   * typeName: The type of entity to look for
-   * excludeDeletedEntities: Should the search include deleted entities too (default: true)
-   * classification: Only include entities with given Classification/tag
-   * query: Any free text occurrence that the entity should have (generic/wildcard queries might be slow)
-   * limit: Max number of results to fetch
-   * offset: Starting offset of the result set (useful for pagination)
-   * entityFilters: Entity Attribute filter(s)
-   * tagFilters: Classification/tag Attribute filter(s)
-   * attributes: Attributes to include in the search result (default: include any attribute present in the filter)
+<verbatim>
+   typeName:               the type of entity to look for
+   excludeDeletedEntities: should the search exclude deleted entities? (default: true)
+   classification:         only include entities with given classification
+   query:                  any free text occurrence that the entity should have (generic/wildcard queries might be slow)
+   offset:                 starting offset of the result set (useful for pagination)
+   limit:                  max number of results to fetch
+   entityFilters:          entity attribute filter(s)
+   tagFilters:             classification attribute filter(s)
+   attributes:             attributes to include in the search result
+</verbatim>

-   Attribute based filtering can be done on multiple attributes with AND/OR condition.
+<img src="images/twiki/search-basic-hive_column-PII.png" height="400" width="600"/>

-    *NOTE: The tagFilters and entityFilters field have same JSON structure.*
+   Attribute based filtering can be done on multiple attributes with AND/OR conditions.

 __Examples of filtering (for hive_table attributes)__
   * Single attribute
   <verbatim>
   {
-     "typeName": "hive_table",
+     "typeName":               "hive_table",
     "excludeDeletedEntities": true,
-     "classification" : "",
-     "query": "",
-     "limit": 50,
-     "offset": 0,
+     "offset":                 0,
+     "limit":                  25,
     "entityFilters": {
-        "attributeName": "name",
-        "operator": "contains",
-        "attributeValue": "testtable"
+        "attributeName":  "name",
+        "operator":       "contains",
+        "attributeValue": "customers"
     },
-     "tagFilters": null,
-     "attributes": [""]
+     "attributes": [ "db", "qualifiedName" ]
   }
   </verbatim>
+
+<img src="images/twiki/search-basic-hive_table-customers.png" height="400" width="600"/>
+
   * Multi-attribute with OR
   <verbatim>
   {
-     "typeName": "hive_table",
+     "typeName":               "hive_table",
     "excludeDeletedEntities": true,
-     "classification" : "",
-     "query": "",
-     "limit": 50,
-     "offset": 0,
+     "offset":                 0,
+     "limit":                  25,
     "entityFilters": {
        "condition": "OR",
        "criterion": [
           {
-              "attributeName": "name",
-              "operator": "contains",
-              "attributeValue": "testtable"
+              "attributeName":  "name",
+              "operator":       "contains",
+              "attributeValue": "customers"
           },
           {
-              "attributeName": "owner",
-              "operator": "eq",
-              "attributeValue": "admin"
+              "attributeName":  "name",
+              "operator":       "contains",
+              "attributeValue": "provider"
           }
        ]
     },
-     "tagFilters": null,
-     "attributes": [""]
+     "attributes": [ "db", "qualifiedName" ]
   }
   </verbatim>
+
+<img src="images/twiki/search-basic-hive_table-customers-or-provider.png" height="400" width="600"/>
+
   * Multi-attribute with AND
   <verbatim>
   {
-     "typeName": "hive_table",
+     "typeName":               "hive_table",
     "excludeDeletedEntities": true,
-     "classification" : "",
-     "query": "",
-     "limit": 50,
-     "offset": 0,
+     "offset":                 0,
+     "limit":                  25,
     "entityFilters": {
        "condition": "AND",
        "criterion": [
           {
-              "attributeName": "name",
-              "operator": "contains",
-              "attributeValue": "testtable"
+              "attributeName":  "name",
+              "operator":       "contains",
+              "attributeValue": "customers"
           },
           {
-              "attributeName": "owner",
-              "operator": "eq",
-              "attributeValue": "admin"
+              "attributeName":  "owner",
+              "operator":       "eq",
+              "attributeValue": "hive"
           }
        ]
     },
-     "tagFilters": null,
-     "attributes": [""]
-   }
+     "attributes": [ "db", "qualifiedName" ]
+  }
   </verbatim>

+<img src="images/twiki/search-basic-hive_table-customers-owner_is_hive.png" height="400" width="600"/>
+
 __Supported operators for filtering__

   * LT (symbols: <, lt) works with Numeric, Date attributes
@@ -135,29 +132,28 @@ __CURL Samples__
    -u <user>:<password>
    -X POST
    -d '{
-            "typeName": "hive_table",
+            "typeName":               "hive_table",
            "excludeDeletedEntities": true,
-            "classification" : "",
-            "query": "",
-            "limit": 50,
-            "offset": 0,
+            "classification":         "",
+            "query":                  "",
+            "offset":                 0,
+            "limit":                  50,
            "entityFilters": {
               "condition": "AND",
               "criterion": [
                  {
-                     "attributeName": "name",
-                     "operator": "contains",
-                     "attributeValue": "testtable"
+                     "attributeName":  "name",
+                     "operator":       "contains",
+                     "attributeValue": "customers"
                  },
                  {
-                     "attributeName": "owner",
-                     "operator": "eq",
-                     "attributeValue": "admin"
+                     "attributeName":  "owner",
+                     "operator":       "eq",
+                     "attributeValue": "hive"
                  }
               ]
            },
-            "tagFilters": null,
-            "attributes": [""]
+            "attributes": [ "db", "qualifiedName" ]
          }'
    <protocol>://<atlas_host>:<atlas_port>/api/atlas/v2/search/basic
 </verbatim>
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -24,6 +24,7 @@ capabilities around these data assets for data scientists, analysts and the data
   * Ability to dynamically create classifications - like PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE
   * Classifications can include attributes - like expiry_date attribute in EXPIRES_ON classification
   * Entities can be associated with multiple classifications, enabling easier discovery and security enforcement
+   * Propagation of classifications via lineage - automatically ensures that classifications follow the data as it goes through various processing

 ---+++ Lineage
   * Intuitive UI to view lineage of data as it moves through various processes
@@ -35,7 +36,8 @@ capabilities around these data assets for data scientists, analysts and the data
   * SQL like query language to search entities - Domain Specific Language (DSL)

 ---+++ Security & Data Masking
-   * Integration with Apache Ranger enables authorization/data-masking based on classifications associated with entities in Apache Atlas. For example:
+   * Fine grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications
+   * Integration with Apache Ranger enables authorization/data-masking on data access based on classifications associated with entities in Apache Atlas. For example:
      * who can access data classified as PII, SENSITIVE
      * customer-service users can only see last 4 digits of columns classified as NATIONAL_ID

@@ -50,20 +52,18 @@ capabilities around these data assets for data scientists, analysts and the data

   * [[Architecture][High Level Architecture]]
   * [[TypeSystem][Type System]]
-   * [[Search - Basic][Basic Search]]
-   * [[Search - Advanced][Advanced Search]]
+   * [[Search - Basic][Search: Basic]]
+   * [[Search - Advanced][Search: Advanced]]
   * [[security][Security]]
   * [[Authentication-Authorization][Authentication and Authorization]]
   * [[Configuration][Configuration]]
-   * Notification
-      * [[Notification-Entity][Entity Notification]]
+   * [[Notifications][Notifications]]
   * Hooks & Bridges
-      * [[Bridge-HBase][HBase Hook & Bridge]]
-      * [[Bridge-Hive][Hive Hook & Bridge]]
+      * [[Hook-HBase][HBase Hook & Bridge]]
+      * [[Hook-Hive][Hive Hook & Bridge]]
+      * [[Hook-Sqoop][Sqoop Hook]]
+      * [[Hook-Storm][Storm Hook]]
      * [[Bridge-Kafka][Kafka Bridge]]
-      * [[Bridge-Sqoop][Sqoop Hook]]
-      * [[StormAtlasHook][Storm Hook]]
-      * [[Bridge-Falcon][Falcon Hook]]
   * [[HighAvailability][Fault Tolerance And High Availability Options]]

 ---++ API Documentation