One falcon_process entity is created for every cluster that the falcon process is defined for.
One falcon_process entity is created for every cluster that the falcon process is defined for.
The entities are created and de-duped using unique qualifiedName attribute. They provide namespace and can be used for querying/lineage as well. The unique attributes are:
The entities are created and de-duped using unique qualifiedName attribute. They provide namespace and can be used for querying/lineage as well. The unique attributes are:
Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model defined in org.apache.atlas.falcon.model.FalconDataModelGenerator.
Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model detailed above.
The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities.
Follow the instructions below to setup Atlas hook in Falcon:
* Add 'org.apache.atlas.falcon.service.AtlasService' to application.services in <falcon-conf>/startup.properties
* Add 'org.apache.atlas.falcon.service.AtlasService' to application.services in <falcon-conf>/startup.properties
* Link falcon hook jars in falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
* Link Atlas hook jars in Falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
* In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
* In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
* atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false
* atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false
* atlas.hook.falcon.numRetries - number of retries for notification failure. default 3
* atlas.hook.falcon.numRetries - number of retries for notification failure. default 3
* atlas.hook.falcon.minThreads - core number of threads. default 5
* atlas.hook.falcon.minThreads - core number of threads. default 5
* atlas.hook.falcon.maxThreads - maximum number of threads. default 5
* atlas.hook.falcon.maxThreads - maximum number of threads. default 5
* atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
* atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
* atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000
* atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000
Refer [[Configuration][Configuration]] for notification related configurations
Refer [[Configuration][Configuration]] for notification related configurations
---++ Limitations
---++ NOTES
* In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity
* In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity
The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that dbName, tableName and columnName should be in lower case. clusterName is explained below.
The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that dbName, tableName and columnName should be in lower case. clusterName is explained below.
* hive_process.queryString - trimmed query string in lower case
---++ Importing Hive Metadata
---++ Importing Hive Metadata
org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata into Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. The script needs Hadoop and Hive classpath jars.
org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata into Atlas using the model defined above. import-hive.sh command can be used to facilitate this.
* For Hadoop jars, please make sure that the environment variable HADOOP_CLASSPATH is set. Another way is to set HADOOP_HOME to point to root directory of your Hadoop installation
* Similarly, for Hive jars, set HIVE_HOME to the root of Hive installation
* Set environment variable HIVE_CONF_DIR to Hive configuration directory
* Copy <atlas-conf>/atlas-application.properties to the hive conf directory
* for details about jaas.conf and a suggested location see the [[security][atlas security documentation]]
---++ Hive Hook
---++ Hive Hook
Hive supports listeners on hive command execution using hive hooks. This is used to add/update/remove entities in Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator.
Atlas Hive hook registers with Hive to listen for create/update/delete operations and updates the metadata in Atlas, via Kafka notifications, for the changes in Hive.
The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities.
Follow the instructions below to setup Atlas hook in Hive:
Follow these instructions in your hive set-up to add hive hook for Atlas:
* Set-up Atlas hook in hive-site.xml by adding the following:
* Set-up atlas hook in hive-site.xml of your hive configuration:
* Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh of your hive configuration
* Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh of your hive configuration
* Copy <atlas-conf>/atlas-application.properties to the hive conf directory.
* Copy <atlas-conf>/atlas-application.properties to the hive conf directory.
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
* atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion.
* atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion.
* atlas.hook.hive.numRetries - number of retries for notification failure. default 3
* atlas.hook.hive.numRetries - number of retries for notification failure. default 3
* atlas.hook.hive.minThreads - core number of threads. default 5
* atlas.hook.hive.minThreads - core number of threads. default 1
* atlas.hook.hive.maxThreads - maximum number of threads. default 5
* atlas.hook.hive.maxThreads - maximum number of threads. default 5
* atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
* atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
* atlas.hook.hive.queueSize - queue size for the threadpool. default 10000
* atlas.hook.hive.queueSize - queue size for the threadpool. default 10000
Refer [[Configuration][Configuration]] for notification related configurations
Refer [[Configuration][Configuration]] for notification related configurations
...
@@ -76,24 +74,23 @@ Refer [[Configuration][Configuration]] for notification related configurations
...
@@ -76,24 +74,23 @@ Refer [[Configuration][Configuration]] for notification related configurations
Starting from 0.8-incubating version of Atlas, Column level lineage is captured in Atlas. Below are the details
Starting from 0.8-incubating version of Atlas, Column level lineage is captured in Atlas. Below are the details
---+++ Model
---+++ Model
* !ColumnLineageProcess type is a subclass of Process
* !ColumnLineageProcess type is a subtype of Process
* This relates an output Column to a set of input Columns or the Input Table
* This relates an output Column to a set of input Columns or the Input Table
* The Lineage also captures the kind of Dependency: currently the values are SIMPLE, EXPRESSION, SCRIPT
* The lineage also captures the kind of dependency, as listed below:
* A SIMPLE dependency means the output column has the same value as the input
* SIMPLE: output column has the same value as the input
* An EXPRESSION dependency means the output column is transformed by some expression in the runtime(for e.g. a Hive SQL expression) on the Input Columns.
* EXPRESSION: output column is transformed by some expression at runtime (for e.g. a Hive SQL expression) on the Input Columns.
* SCRIPT means that the output column is transformed by a user provided script.
* SCRIPT: output column is transformed by a user provided script.
* In case of EXPRESSION dependency the expression attribute contains the expression in string form
* In case of EXPRESSION dependency the expression attribute contains the expression in string form
* Since Process links input and output !DataSets, we make Column a subclass of !DataSet
* Since Process links input and output !DataSets, Column is a subtype of !DataSet
---+++ Examples
---+++ Examples
For a simple CTAS below:
For a simple CTAS below:
<verbatim>
<verbatim>
create table t2 as select id, name from T1
create table t2 as select id, name from T1</verbatim>
</verbatim>
The lineage is captured as
The lineage is captured as
...
@@ -106,10 +103,8 @@ The lineage is captured as
...
@@ -106,10 +103,8 @@ The lineage is captured as
* The !LineageInfo in Hive provides column-level lineage for the final !FileSinkOperator, linking them to the input columns in the Hive Query
* The !LineageInfo in Hive provides column-level lineage for the final !FileSinkOperator, linking them to the input columns in the Hive Query
---+++ NOTE
---++ NOTES
Column level lineage works with Hive version 1.2.1 after the patch for <a href="https://issues.apache.org/jira/browse/HIVE-13112">HIVE-13112</a> is applied to Hive source
* Column level lineage works with Hive version 1.2.1 after the patch for <a href="https://issues.apache.org/jira/browse/HIVE-13112">HIVE-13112</a> is applied to Hive source
---++ Limitations
* Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names
* Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names
* The following hive operations are captured by hive hook currently
* The following hive operations are captured by hive hook currently
@@ -5,139 +5,42 @@ All configuration in Atlas uses java properties style configuration. The main co
...
@@ -5,139 +5,42 @@ All configuration in Atlas uses java properties style configuration. The main co
---++ Graph Configs
---++ Graph Configs
---+++ Graph persistence engine
---+++ Graph Persistence engine - HBase
Set the following properties to configure JanusGraph to use HBase as the persistence engine. Please refer to
This section sets up the graph db - titan - to use a persistence engine. Please refer to
<a href="http://docs.janusgraph.org/0.2.0/configuration.html#_hbase_caching">link</a> for more details.
<a href="http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html">link</a> for more
details. The example below uses BerkeleyDBJE.
<verbatim>
atlas.graph.storage.backend=berkeleyje
atlas.graph.storage.directory=data/berkeley
</verbatim>
---++++ Graph persistence engine - Hbase
Basic configuration
<verbatim>
<verbatim>
atlas.graph.storage.backend=hbase
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=<ZooKeeper Quorum>
atlas.graph.storage.hostname=<ZooKeeper Quorum>
atlas.graph.storage.hbase.table=atlas
</verbatim>
</verbatim>
HBASE_CONF_DIR environment variable needs to be set to point to the Hbase client configuration directory which is added to classpath when Atlas starts up.
If any further JanusGraph configuration needs to be setup, please prefix the property name with "atlas.graph.".
hbase-site.xml needs to have the following properties set according to the cluster setup
<verbatim>
#Set below to /hbase-secure if the Hbase server is setup in secure mode
zookeeper.znode.parent=/hbase-unsecure
</verbatim>
Advanced configuration
In addition to setting up configurations, please ensure that environment variable HBASE_CONF_DIR is setup to point to
the directory containing HBase configuration file hbase-site.xml.
# If you are planning to use any of the configs mentioned below, they need to be prefixed with "atlas.graph." to take effect in ATLAS
Solr installation in Cloud mode is a prerequisite for Apache Atlas use. Set the following properties to configure JanusGraph to use Solr as the index search engine.
Permissions
When Atlas is configured with HBase as the storage backend the graph db (titan) needs sufficient user permissions to be able to create and access an HBase table. In a secure cluster it may be necessary to grant permissions to the 'atlas' user for the 'titan' table.
With Ranger, a policy can be configured for 'titan'.
Without Ranger, HBase shell can be used to set the permissions.
Please note that Solr installation in Cloud mode is a prerequisite before configuring Solr as the search indexing backend. Refer InstallationSteps section for Solr installation/configuration.
<verbatim>
atlas.graph.index.search.backend=solr5
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper Connection Timeout>. Default value is 60000 ms
atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper Session Timeout>. Default value is 60000 ms
</verbatim>
Also note that if the embedded-hbase-solr profile is used then Solr is included in the distribution so that a standalone
instance of Solr can be started as the default search indexing backend. Using the embedded-hbase-solr profile will
configure Atlas so that the standalone Solr instance will be started and stopped along with the Atlas server by default.
To use the embedded-hbase-solr profile please see "Building Atlas" in the [[InstallationSteps][Installation Steps]]
section.
---+++ Choosing between Persistence and Indexing Backends
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between the persistence backends.
BerkeleyDB is suitable for smaller data sets in the range of upto 10 million vertices with ACID gurantees.
HBase on the other hand doesnt provide ACID guarantees but is able to scale for larger graphs. HBase also provides HA inherently.
---+++ Choosing between Persistence Backends
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between the persistence backends.
BerkeleyDB is suitable for smaller data sets in the range of upto 10 million vertices with ACID gurantees.
HBase on the other hand doesnt provide ACID guarantees but is able to scale for larger graphs. HBase also provides HA inherently.
---+++ Choosing between Indexing Backends
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html for choosing between !ElasticSearch and Solr.
Solr in cloud mode is the recommended setup.
---+++ Switching Persistence Backend
For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for "Graph Persistence Engine" described above and restart ATLAS.
The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search.
!ElasticSearch runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory.
For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes
---+++ Switching Index Backend
Switching the Index backend requires clearing the persistence backend data. Otherwise there will be discrepancies between the persistence and index backends since switching the indexing backend means index data will be lost.
This leads to "Fulltext" queries not working on the existing data
For clearing the data for BerkeleyDB, delete the ATLAS_HOME/data/berkeley directory
For clearing the data for HBase, in Hbase shell, run 'disable titan' and 'drop titan'
---++ Lineage Configs
The higher layer services like lineage, schema, etc. are driven by the type system and this section encodes the specific types for the hive data model.
# This models reflects the base super types for Data and Process
<verbatim>
atlas.lineage.hive.table.type.name=DataSet
atlas.lineage.hive.process.type.name=Process
atlas.lineage.hive.process.inputs.name=inputs
atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query=hive_table where name=?, columns
</verbatim>
</verbatim>
---++ Search Configs
---++ Search Configs
Search APIs (DSL and full text search) support pagination and have optional limit and offset arguments. Following configs are related to search pagination
Search APIs (DSL, basic search, full-text search) support pagination and have optional limit and offset arguments. Following configs are related to search pagination
<verbatim>
<verbatim>
# Default limit used when limit is not specified in API
# Default limit used when limit is not specified in API
...
@@ -152,53 +55,36 @@ atlas.search.maxlimit=10000
...
@@ -152,53 +55,36 @@ atlas.search.maxlimit=10000
Refer http://kafka.apache.org/documentation.html#configuration for Kafka configuration. All Kafka configs should be prefixed with 'atlas.kafka.'
Refer http://kafka.apache.org/documentation.html#configuration for Kafka configuration. All Kafka configs should be prefixed with 'atlas.kafka.'
<verbatim>
<verbatim>
atlas.notification.embedded=true
atlas.kafka.auto.commit.enable=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
</verbatim>
Note that Kafka group ids are specified for a specific topic. The Kafka group id configuration for entity notifications is 'atlas.kafka.entities.group.id'
# Kafka servers. Example: localhost:6667
atlas.kafka.bootstrap.servers=
<verbatim>
atlas.kafka.hook.group.id=atlas
atlas.kafka.entities.group.id=<consumer id>
</verbatim>
These configuration parameters are useful for setting up Kafka topics via Atlas provided scripts, described in the
# Zookeeper connect URL for Kafka. Example: localhost:2181
[[InstallationSteps][Installation Steps]] page.
atlas.kafka.zookeeper.connect=
<verbatim>
atlas.kafka.zookeeper.connection.timeout.ms=30000
# Whether to create the topics automatically, default is true.
atlas.kafka.zookeeper.session.timeout.ms=60000
# Comma separated list of topics to be created, default is "ATLAS_HOOK,ATLAS_ENTITES"
# If saving messages is enabled, the file name to save them to. This file will be created under the log directory of the hook's host component - like HiveServer2
# The format of these options is <scheme>:<identity>. For more information refer to http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
# The format of these options is <scheme>:<identity>. For more information refer to http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
# The 'acl' option allows to specify a scheme, identity pair to setup an ACL for.
# The 'acl' option allows to specify a scheme, identity pair to setup an ACL for.
@@ -157,9 +157,9 @@ At a high level the following points can be called out:
...
@@ -157,9 +157,9 @@ At a high level the following points can be called out:
---++ Metadata Store
---++ Metadata Store
As described above, Atlas uses Titan to store the metadata it manages. By default, Atlas uses a standalone HBase
As described above, Atlas uses JanusGraph to store the metadata it manages. By default, Atlas uses a standalone HBase
instance as the backing store for Titan. In order to provide HA for the metadata store, we recommend that Atlas be
instance as the backing store for JanusGraph. In order to provide HA for the metadata store, we recommend that Atlas be
configured to use distributed HBase as the backing store for Titan. Doing this implies that you could benefit from the
configured to use distributed HBase as the backing store for JanusGraph. Doing this implies that you could benefit from the
HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mode, do the following:
HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mode, do the following:
* Choose an existing HBase cluster that is set up in HA mode to configure in Atlas (OR) Set up a new HBase cluster in [[http://hbase.apache.org/book.html#quickstart_fully_distributed][HA mode]].
* Choose an existing HBase cluster that is set up in HA mode to configure in Atlas (OR) Set up a new HBase cluster in [[http://hbase.apache.org/book.html#quickstart_fully_distributed][HA mode]].
...
@@ -169,8 +169,8 @@ HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mod
...
@@ -169,8 +169,8 @@ HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mod
---++ Index Store
---++ Index Store
As described above, Atlas indexes metadata through Titan to support full text search queries. In order to provide HA
As described above, Atlas indexes metadata through JanusGraph to support full text search queries. In order to provide HA
for the index store, we recommend that Atlas be configured to use Solr as the backing index store for Titan. In order
for the index store, we recommend that Atlas be configured to use Solr as the backing index store for JanusGraph. In order
to configure Atlas to use Solr in HA mode, do the following:
to configure Atlas to use Solr in HA mode, do the following:
* Choose an existing !SolrCloud cluster setup in HA mode to configure in Atlas (OR) Set up a new [[https://cwiki.apache.org/confluence/display/solr/SolrCloud][SolrCloud cluster]].
* Choose an existing !SolrCloud cluster setup in HA mode to configure in Atlas (OR) Set up a new [[https://cwiki.apache.org/confluence/display/solr/SolrCloud][SolrCloud cluster]].
...
@@ -208,4 +208,4 @@ to configure Atlas to use Kafka in HA mode, do the following:
...
@@ -208,4 +208,4 @@ to configure Atlas to use Kafka in HA mode, do the following:
---++ Known Issues
---++ Known Issues
* If the HBase region servers hosting the Atlas ‘titan’ HTable are down, Atlas would not be able to store or retrieve metadata from HBase until they are brought back online.
* If the HBase region servers hosting the Atlas table are down, Atlas would not be able to store or retrieve metadata from HBase until they are brought back online.
Once the build successfully completes, artifacts can be packaged for deployment.
<verbatim>
mvn clean package -Pdist
</verbatim>
NOTE:
1. Use option '-DskipTests' to skip running unit and integration tests
2. Use option '-P perf' to instrument atlas to collect performance metrics
To build a distribution that configures Atlas for external HBase and Solr, build with the external-hbase-solr profile.
---+++ Packaging Atlas
To create Apache Atlas package for deployment in an environment having functional HBase and Solr instances, build with the following command:
<verbatim>
<verbatim>
mvn clean -DskipTests package -Pdist</verbatim>
mvn clean package -Pdist,external-hbase-solr
* NOTES:
* Remove option '-DskipTests' to run unit and integration tests
* To build a distribution without minified js,css file, build with skipMinify profile. By default js and css files are minified.
</verbatim>
Note that when the external-hbase-solr profile is used the following steps need to be completed to make Atlas functional.
Above will build Atlas for an environment having functional HBase and Solr instances. Atlas needs to be setup with the following to run in this environment:
* Configure atlas.graph.storage.hostname (see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section).
* Configure atlas.graph.storage.hostname (see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section).
* Configure atlas.graph.index.search.solr.zookeeper-url (see "Graph Search Index - Solr" in the [[Configuration][Configuration]] section).
* Configure atlas.graph.index.search.solr.zookeeper-url (see "Graph Search Index - Solr" in the [[Configuration][Configuration]] section).
* Set HBASE_CONF_DIR to point to a valid HBase config directory (see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section).
* Set HBASE_CONF_DIR to point to a valid HBase config directory (see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section).
* Create the SOLR indices (see "Graph Search Index - Solr" in the [[Configuration][Configuration]] section).
* Create the SOLR indices (see "Graph Search Index - Solr" in the [[Configuration][Configuration]] section).
To build a distribution that packages HBase and Solr, build with the embedded-hbase-solr profile.
<verbatim>
mvn clean package -Pdist,embedded-hbase-solr
</verbatim>
Using the embedded-hbase-solr profile will configure Atlas so that an HBase instance and a Solr instance will be started
and stopped along with the Atlas server by default.
Atlas also supports building a distribution that can use BerkeleyDB and Elastic search as the graph and index backends.
---+++ Packaging Atlas with Embedded HBase & Solr
To build a distribution that is configured for these backends, build with the berkeley-elasticsearch profile.
To create Apache Atlas package that includes HBase and Solr, build with the embedded-hbase-solr profile as shown below:
Using the embedded-hbase-solr profile will configure Atlas so that an HBase instance and a Solr instance will be started and stopped along with the Atlas server by default.
</verbatim>
An additional step is required for the binary built using this profile to be used along with the Atlas distribution.
Due to licensing requirements, Atlas does not bundle the BerkeleyDB Java Edition in the tarball.
You can download the Berkeley DB jar file from the URL: <verbatim>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip</verbatim>
and copy the je-5.0.73.jar to the ${atlas_home}/libext directory.
Tar can be found in atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz
---+++ Apache Atlas Package
Build will create following files, which are used to install Apache Atlas.
Note that if the embedded-hbase-solr profile is specified for the build then HBase and Solr are included in the
distribution.
In this case, a standalone instance of HBase can be started as the default storage backend for the graph repository.
During Atlas installation the conf/hbase/hbase-site.xml.template gets expanded and moved to hbase/conf/hbase-site.xml
for the initial standalone HBase configuration. To configure ATLAS
graph persistence for a different HBase instance, please see "Graph persistence engine - HBase" in the
[[Configuration][Configuration]] section.
Also, a standalone instance of Solr can be started as the default search indexing backend. To configure ATLAS search
indexing for a different Solr instance please see "Graph Search Index - Solr" in the
[[Configuration][Configuration]] section.
To build a distribution without minified js,css file, build with the skipMinify profile.
<verbatim>
mvn clean package -Pdist,skipMinify
</verbatim>
Note that by default js and css files are minified.
---+++ Installing & Running Atlas
---+++ Installing & Running Atlas
...
@@ -137,18 +53,12 @@ Note that by default js and css files are minified.
...
@@ -137,18 +53,12 @@ Note that by default js and css files are minified.
<verbatim>
<verbatim>
tar -xzvf apache-atlas-${project.version}-bin.tar.gz
tar -xzvf apache-atlas-${project.version}-bin.tar.gz
cd atlas-${project.version}
cd atlas-${project.version}</verbatim>
</verbatim>
---++++ Configuring Atlas
---++++ Configuring Atlas
By default config directory used by Atlas is {package dir}/conf. To override this set environment variable ATLAS_CONF to the path of the conf dir.
By default config directory used by Atlas is {package dir}/conf. To override this set environment
Environment variables needed to run Atlas can be set in atlas-env.sh file in the conf directory. This file will be sourced by Atlas scripts before any commands are executed. The following environment variables are available to set.
variable ATLAS_CONF to the path of the conf dir.
atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment
variables that you need for you services. In addition you can set any other environment
variables you might need. This file will be sourced by atlas scripts before any commands are
executed. The following environment variables are available to set.
<verbatim>
<verbatim>
# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
...
@@ -169,7 +79,7 @@ executed. The following environment variables are available to set.
...
@@ -169,7 +79,7 @@ executed. The following environment variables are available to set.
# java heap size we want to set for the atlas server. Default is 1024MB
# java heap size we want to set for the atlas server. Default is 1024MB
#export ATLAS_SERVER_HEAP=
#export ATLAS_SERVER_HEAP=
# What is is considered as atlas home dir. Default is the base locaion of the installed software
# What is is considered as atlas home dir. Default is the base location of the installed software
#export ATLAS_HOME_DIR=
#export ATLAS_HOME_DIR=
# Where log files are stored. Defatult is logs directory under the base install location
# Where log files are stored. Defatult is logs directory under the base install location
...
@@ -178,66 +88,48 @@ executed. The following environment variables are available to set.
...
@@ -178,66 +88,48 @@ executed. The following environment variables are available to set.
# Where pid files are stored. Defatult is logs directory under the base install location
# Where pid files are stored. Defatult is logs directory under the base install location
#export ATLAS_PID_DIR=
#export ATLAS_PID_DIR=
# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
#export ATLAS_DATA_DIR=
# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
#export ATLAS_EXPANDED_WEBAPP_DIR=
#export ATLAS_EXPANDED_WEBAPP_DIR=</verbatim>
</verbatim>
*Settings to support large number of metadata objects*
*Settings to support large number of metadata objects*
If you plan to store several tens of thousands of metadata objects, it is recommended that you use values
If you plan to store large number of metadata objects, it is recommended that you use values tuned for better GC performance of the JVM.
tuned for better GC performance of the JVM.
The following values are common server side options:
The following values are common server side options:
The =-XX:SoftRefLRUPolicyMSPerMB= option was found to be particularly helpful to regulate GC performance for
The =-XX:SoftRefLRUPolicyMSPerMB= option was found to be particularly helpful to regulate GC performance for query heavy workloads with many concurrent users.
*Hbase as the Storage Backend for the Graph Repository*
*HBase as the Storage Backend for the Graph Repository*
By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
By default, Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section for more details.
The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section
for more details.
Pre-requisites for running HBase as a distributed cluster
HBase tables used by Atlas can be set using the following configurations:
* 3 or 5 !ZooKeeper nodes
* Atleast 3 !RegionServer nodes. It would be ideal to run the !DataNodes on the same hosts as the Region servers for data locality.
HBase tablename in Titan can be set using the following configuration in ATLAS_HOME/conf/atlas-application.properties:
*Configuring SOLR as the Indexing Backend for the Graph Repository*
*Configuring SOLR as the Indexing Backend for the Graph Repository*
By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
By default, Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. For configuring JanusGraph to work with Solr, please follow the instructions below
For configuring Titan to work with Solr, please follow the instructions below
* Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz
* Install solr if not already running. The version of SOLR supported is 5.5.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.5.1/solr-5.5.1.tgz
* Start solr in cloud mode.
* Start solr in cloud mode.
!SolrCloud mode uses a !ZooKeeper Service as a highly available, central location for cluster management.
!SolrCloud mode uses a !ZooKeeper Service as a highly available, central location for cluster management.
...
@@ -249,15 +141,12 @@ For configuring Titan to work with Solr, please follow the instructions below
...
@@ -249,15 +141,12 @@ For configuring Titan to work with Solr, please follow the instructions below
* Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts,
* Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts, first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files have been copied to on Solr host:
first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
...
@@ -274,12 +163,11 @@ For configuring Titan to work with Solr, please follow the instructions below
...
@@ -274,12 +163,11 @@ For configuring Titan to work with Solr, please follow the instructions below
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper Connection Timeout>. Default value is 60000 ms
atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper Connection Timeout>. Default value is 60000 ms
atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper Session Timeout>. Default value is 60000 ms
atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper Session Timeout>. Default value is 60000 ms</verbatim>
</verbatim>
* Restart Atlas
* Restart Atlas
For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm
For more information on JanusGraph solr configuration , please refer http://docs.janusgraph.org/0.2.0/solr.html
Pre-requisites for running Solr in cloud mode
Pre-requisites for running Solr in cloud mode
* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
...
@@ -299,85 +187,124 @@ use configuration in =atlas-application.properties= for setting up the topics. P
...
@@ -299,85 +187,124 @@ use configuration in =atlas-application.properties= for setting up the topics. P
for these details.
for these details.
---++++ Setting up Atlas
---++++ Setting up Atlas
There are a few steps that setup dependencies of Atlas. One such example is setting up the JanusGraph schema in the storage backend of choice. In a simple single server setup, these are automatically setup with default configuration when the server first accesses these dependencies.
There are a few steps that setup dependencies of Atlas. One such example is setting up the Titan schema
However, there are scenarios when we may want to run setup steps explicitly as one time operations. For example, in a multiple server scenario using [[HighAvailability][High Availability]], it is preferable to run setup steps from one of the server instances the first time, and then start the services.
in the storage backend of choice. In a simple single server setup, these are automatically setup with default
configuration when the server first accesses these dependencies.
However, there are scenarios when we may want to run setup steps explicitly as one time operations. For example, in a
multiple server scenario using [[HighAvailability][High Availability]], it is preferable to run setup steps from one
of the server instances the first time, and then start the services.
To run these steps one time, execute the command =bin/atlas_start.py -setup= from a single Atlas server instance.
To run these steps one time, execute the command =bin/atlas_start.py -setup= from a single Atlas server instance.
However, the Atlas server does take care of parallel executions of the setup steps. Also, running the setup steps multiple
However, the Atlas server does take care of parallel executions of the setup steps. Also, running the setup steps multiple times is idempotent. Therefore, if one chooses to run the setup steps as part of server startup, for convenience, then they should enable the configuration option =atlas.server.run.setup.on.start= by defining it with the value =true= in the =atlas-application.properties= file.
times is idempotent. Therefore, if one chooses to run the setup steps as part of server startup, for convenience,
then they should enable the configuration option =atlas.server.run.setup.on.start= by defining it with the value =true=
in the =atlas-application.properties= file.
---++++ Starting Atlas Server
---++++ Starting Atlas Server
<verbatim>
<verbatim>
bin/atlas_start.py [-port <port>]
bin/atlas_start.py [-port <port>]</verbatim>
</verbatim>
By default,
* To change the port, use -port option.
* atlas server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple atlas upgrades), set environment variable ATLAS_CONF to the path of conf dir
Once atlas is started, you can view the status of atlas entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.
"typeName":"hive_db",
"guid":"5d900c19-094d-4681-8a86-4eb1d6ffbe89",
"status":"ACTIVE",
"displayText":"default",
"classificationNames":[],
"attributes":{
"owner":"public",
"createTime":null,
"qualifiedName":"default@cl1",
"name":"default",
"description":
"Default Hive database"
}
}
]
}</verbatim>
---+++ Stopping Atlas Server
---+++ Stopping Atlas Server
<verbatim>
<verbatim>
bin/atlas_stop.py
bin/atlas_stop.py</verbatim>
</verbatim>
---+++ Known Issues
---++++ Setup
---+++ Troubleshooting
---++++ Setup issues
If the setup of Atlas service fails due to any reason, the next run of setup (either by an explicit invocation of
If the setup of Atlas service fails due to any reason, the next run of setup (either by an explicit invocation of
=atlas_start.py -setup= or by enabling the configuration option =atlas.server.run.setup.on.start=) will fail with
=atlas_start.py -setup= or by enabling the configuration option =atlas.server.run.setup.on.start=) will fail with
a message such as =A previous setup run may not have completed cleanly.=. In such cases, you would need to manually
a message such as =A previous setup run may not have completed cleanly.=. In such cases, you would need to manually
ensure the setup can run and delete the Zookeeper node at =/apache_atlas/setup_in_progress= before attempting to
ensure the setup can run and delete the Zookeeper node at =/apache_atlas/setup_in_progress= before attempting to
run setup again.
run setup again.
If the setup failed due to HBase Titan schema setup errors, it may be necessary to repair the HBase schema. If no
If the setup failed due to HBase JanusGraph schema setup errors, it may be necessary to repair the HBase schema. If no
data has been stored, one can also disable and drop the 'titan' schema in HBase to let setup run again.
data has been stored, one can also disable and drop the HBase tables used by Atlas and run setup again.
* A type can ‘extend’ from a parent type called ‘supertype’ - by virtue of this, it will get to include the attributes that are defined in the supertype as well. This allows modellers to define common attributes across a set of related types etc. This is again similar to the concept of how Object Oriented languages define super classes for a class. It is also possible for a type in Atlas to extend from multiple super types.
* Entity & Classification types can ‘extend’ from other types, called ‘supertype’ - by virtue of this, it will get to include the attributes that are defined in the supertype as well. This allows modellers to define common attributes across a set of related types etc. This is again similar to the concept of how Object Oriented languages define super classes for a class. It is also possible for a type in Atlas to extend from multiple super types.
* In this example, every hive table extends from a pre-defined supertype called a ‘DataSet’. More details about this pre-defined types will be provided later.
* In this example, every hive table extends from a pre-defined supertype called a ‘DataSet’. More details about this pre-defined types will be provided later.
* Types which have a metatype of ‘Class’, ‘Struct’ or ‘Trait’ can have a collection of attributes. Each attribute has a name (e.g. ‘name’) and some other associated properties. A property can be referred to using an expression type_name.attribute_name. It is also good to note that attributes themselves are defined using Atlas metatypes.
* Types which have a metatype of ‘Entity’, ‘Struct’, ‘Classification’ or 'Relationship' can have a collection of attributes. Each attribute has a name (e.g. ‘name’) and some other associated properties. A property can be referred to using an expression type_name.attribute_name. It is also good to note that attributes themselves are defined using Atlas metatypes.
* In this example, hive_table.name is a String, hive_table.aliases is an array of Strings, hive_table.db refers to an instance of a type called hive_db and so on.
* In this example, hive_table.name is a String, hive_table.aliases is an array of Strings, hive_table.db refers to an instance of a type called hive_db and so on.
* Type references in attributes, (like hive_table.db) are particularly interesting. Note that using such an attribute, we can define arbitrary relationships between two types defined in Atlas and thus build rich models. Note that one can also collect a list of references as an attribute type (e.g. hive_table.cols which represents a list of references from hive_table to the hive_column type)
* Type references in attributes, (like hive_table.db) are particularly interesting. Note that using such an attribute, we can define arbitrary relationships between two types defined in Atlas and thus build rich models. Note that one can also collect a list of references as an attribute type (e.g. hive_table.columns which represents a list of references from hive_table to hive_column type)
---++ Entities
---++ Entities
An ‘entity’ in Atlas is a specific value or instance of an Entity ‘type’ and thus represents a specific metadata object
An ‘entity’ in Atlas is a specific value or instance of a Class ‘type’ and thus represents a specific metadata object
in the real world. Referring back to our analogy of Object Oriented Programming languages, an ‘instance’ is an
in the real world. Referring back to our analogy of Object Oriented Programming languages, an ‘instance’ is an
‘Object’ of a certain ‘Class’.
‘Object’ of a certain ‘Class’.
An example of an entity will be a specific Hive Table. Say Hive has a table called ‘customers’ in the ‘default’
An example of an entity will be a specific Hive Table. Say Hive has a table called ‘customers’ in the ‘default’
database. This table will be an ‘entity’ in Atlas of type hive_table. By virtue of being an instance of a class
database. This table will be an ‘entity’ in Atlas of type hive_table. By virtue of being an instance of an entity
type, it will have values for every attribute that are a part of the Hive table ‘type’, such as:
type, it will have values for every attribute that are a part of the Hive table ‘type’, such as:
The following points can be noted from the example above:
The following points can be noted from the example above:
* Every entity that is an instance of a Class type is identified by a unique identifier, a GUID. This GUID is generated by the Atlas server when the object is defined, and remains constant for the entire lifetime of the entity. At any point in time, this particular entity can be accessed using its GUID.
* Every instance ofan entity type is identified by a unique identifier, a GUID. This GUID is generated by the Atlas server when the object is defined, and remains constant for the entire lifetime of the entity. At any point in time, this particular entity can be accessed using its GUID.
* In this example, the ‘customers’ table in the default database is uniquely identified by the GUID "9ba387dd-fa76-429c-b791-ffc338d3c91f"
* In this example, the ‘customers’ table in the default database is uniquely identified by the GUID "9ba387dd-fa76-429c-b791-ffc338d3c91f"
* An entity is of a given type, and the name of the type is provided with the entity definition.
* An entity is of a given type, and the name of the type is provided with the entity definition.
* In this example, the ‘customers’ table is a ‘hive_table.
* In this example, the ‘customers’ table is a ‘hive_table.
* The values of this entity are a map of all the attribute names and their values for attributes that are defined in the hive_table type definition.
* The values of this entity are a map of all the attribute names and their values for attributes that are defined in the hive_table type definition.
* Attribute values will be according to the metatype of the attribute.
* Attribute values will be according to the datatype of the attribute. Entity-type attributes will have value of type AtlasObjectId
* Collection metatypes: An array or map of values of the contained metatype. E.g. parameters = { “transient_lastDdlTime”: “1466403208”}
With this idea on entities, we can now see the difference between Entity and Struct metatypes. Entities and Structs
* Composite metatypes: For classes, the value will be an entity with which this particular entity will have a relationship. E.g. The hive table “customers” is present in a database called “default”. The relationship between the table and database are captured via the “db” attribute. Hence, the value of the “db” attribute will be a GUID that uniquely identifies the hive_db entity called “default”
both compose attributes of other types. However, instances of Entity types have an identity (with a GUID value) and can
be referenced from other entities (like a hive_db entity is referenced from a hive_table entity). Instances of Struct
With this idea on entities, we can now see the difference between Class and Struct metatypes. Classes and Structs
types do not have an identity of their own. The value of a Struct type is a collection of attributes that are
both compose attributes of other types. However, entities of Class types have the Id attribute (with a GUID value) a
nd can be referenced from other entities (like a hive_db entity is referenced from a hive_table entity). Instances of
Struct types do not have an identity of their own. The value of a Struct type is a collection of attributes that are
‘embedded’ inside the entity itself.
‘embedded’ inside the entity itself.
---++ Attributes
---++ Attributes
We already saw that attributes are defined inside metatypes like Entity, Struct, Classification and Relationship. But we
We already saw that attributes are defined inside composite metatypes like Class and Struct. But we simplistically
implistically referred to attributes as having a name and a metatype value. However, attributes in Atlas have some more
referred to attributes as having a name and a metatype value. However, attributes in Atlas have some more properties
properties that define more concepts related to the type system.
that define more concepts related to the type system.
An attribute has the following properties:
An attribute has the following properties:
<verbatim>
<verbatim>
name: string,
name: string,
dataTypeName: string,
typeName: string,
isComposite: boolean,
isOptional: boolean,
isIndexable: boolean,
isIndexable: boolean,
isUnique: boolean,
isUnique: boolean,
multiplicity: enum,
cardinality: enum</verbatim>
reverseAttributeName: string
</verbatim>
The properties above have the following meanings:
The properties above have the following meanings:
...
@@ -132,7 +122,7 @@ The properties above have the following meanings:
...
@@ -132,7 +122,7 @@ The properties above have the following meanings:
* isIndexable -
* isIndexable -
* This flag indicates whether this property should be indexed on, so that look ups can be performed using the attribute value as a predicate and can be performed efficiently.
* This flag indicates whether this property should be indexed on, so that look ups can be performed using the attribute value as a predicate and can be performed efficiently.
* isUnique -
* isUnique -
* This flag is again related to indexing. If specified to be unique, it means that a special index is created for this attribute in Titan that allows for equality based look ups.
* This flag is again related to indexing. If specified to be unique, it means that a special index is created for this attribute in JanusGraph that allows for equality based look ups.
* Any attribute with a true value for this flag is treated like a primary key to distinguish this entity from other entities. Hence care should be taken ensure that this attribute does model a unique property in real world.
* Any attribute with a true value for this flag is treated like a primary key to distinguish this entity from other entities. Hence care should be taken ensure that this attribute does model a unique property in real world.
* For e.g. consider the name attribute of a hive_table. In isolation, a name is not a unique attribute for a hive_table, because tables with the same name can exist in multiple databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata of hive tables amongst multiple clusters. Only a cluster location, database name and table name can be deemed unique in the physical world.
* For e.g. consider the name attribute of a hive_table. In isolation, a name is not a unique attribute for a hive_table, because tables with the same name can exist in multiple databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata of hive tables amongst multiple clusters. Only a cluster location, database name and table name can be deemed unique in the physical world.
* multiplicity - indicates whether this attribute is required, optional, or could be multi-valued. If an entity’s definition of the attribute value does not match the multiplicity declaration in the type definition, this would be a constraint violation and the entity addition will fail. This field can therefore be used to define some constraints on the metadata information.
* multiplicity - indicates whether this attribute is required, optional, or could be multi-valued. If an entity’s definition of the attribute value does not match the multiplicity declaration in the type definition, this would be a constraint violation and the entity addition will fail. This field can therefore be used to define some constraints on the metadata information.
...
@@ -142,59 +132,55 @@ Let us look at the attribute called ‘db’ which represents the database to wh
...
@@ -142,59 +132,55 @@ Let us look at the attribute called ‘db’ which represents the database to wh
<verbatim>
<verbatim>
db:
db:
"dataTypeName": "hive_db",
"name": "db",
"isComposite": false,
"typeName": "hive_db",
"isOptional": false,
"isIndexable": true,
"isIndexable": true,
"isUnique": false,
"isUnique": false,
"multiplicity": "required",
"cardinality": "SINGLE"</verbatim>
"name": "db",
"reverseAttributeName": null
</verbatim>
Note the “required” constraint on multiplicity. A table entity cannot be sent without a db reference.
Note the “isOptional=true” constraint - a table entity cannot be created without a db reference.
@@ -43,7 +43,7 @@ The properties for configuring service authentication are:
...
@@ -43,7 +43,7 @@ The properties for configuring service authentication are:
* <code>atlas.authentication.keytab</code> - the path to the keytab file.
* <code>atlas.authentication.keytab</code> - the path to the keytab file.
* <code>atlas.authentication.principal</code> - the principal to use for authenticating to the KDC. The principal is generally of the form "user/host@realm". You may use the '_HOST' token for the hostname and the local hostname will be substituted in by the runtime (e.g. "Atlas/_HOST@EXAMPLE.COM").
* <code>atlas.authentication.principal</code> - the principal to use for authenticating to the KDC. The principal is generally of the form "user/host@realm". You may use the '_HOST' token for the hostname and the local hostname will be substituted in by the runtime (e.g. "Atlas/_HOST@EXAMPLE.COM").
Note that when Atlas is configured with HBase as the storage backend in a secure cluster, the graph db (titan) needs sufficient user permissions to be able to create and access an HBase table. To grant the appropriate permissions see [[Configuration][Graph persistence engine - Hbase]].
Note that when Atlas is configured with HBase as the storage backend in a secure cluster, the graph db (JanusGraph) needs sufficient user permissions to be able to create and access an HBase table. To grant the appropriate permissions see [[Configuration][Graph persistence engine - Hbase]].