Configuration.twiki 4.45 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
---+ Configuring Apache Atlas

---++ Introduction

All configuration in Atlas uses java properties style configuration.

---++ Application Properties

The main configuration file is application.properties which is in the *conf* dir at the deployed
location. It consists of the following sections:

---+++ Graph Database Configs

---++++ Graph persistence engine

This section sets up the graph db - titan - to use a persistence engine. Please refer to
<a href="http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html">link</a> for more
details. The example below uses BerkeleyDBJE.

<verbatim>
21 22
atlas.graph.storage.backend=berkeleyje
atlas.graph.storage.directory=data/berkley
23 24
</verbatim>

25 26 27 28 29 30 31 32 33 34 35
---+++++ Graph persistence engine - Hbase

Basic configuration

<verbatim>
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=<ZooKeeper Quorum>
</verbatim>

36 37 38 39 40 41 42
HBASE_CONF_DIR environment variable needs to be set to point to the Hbase client configuration directory which is added to classpath when Atlas starts up.
hbase-site.xml needs to have the following properties set according to the cluster setup
<verbatim>
#Set below to /hbase-secure if the Hbase server is setup in secure mode
zookeeper.znode.parent=/hbase-unsecure
</verbatim>

43 44
Advanced configuration

45
# If you are planning to use any of the configs mentioned below, they need to be prefixed with "atlas.graph." to take effect in ATLAS
46 47
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase

48 49 50 51 52 53 54 55 56 57 58 59 60
Permissions

When Atlas is configured with HBase as the storage backend the graph db (titan) needs sufficient user permissions to be able to create and access an HBase table.  In a secure cluster it may be necessary to grant permissions to the 'atlas' user for the 'titan' table.

With Ranger, a policy can be configured for 'titan'.

Without Ranger, HBase shell can be used to set the permissions.

<verbatim>
   su hbase
   kinit -k -t <hbase keytab> <hbase principal>
   echo "grant 'atlas', 'RWXCA', 'titan'" | hbase shell
</verbatim>
61

62 63 64 65 66
---++++ Graph Search Index
This section sets up the graph db - titan - to use an search indexing system. The example
configuration below setsup to use an embedded Elastic search indexing system.

<verbatim>
67 68 69 70 71
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
72 73
</verbatim>

74 75 76 77 78 79 80 81
---++++ Graph Search Index - Solr

<verbatim>
 atlas.graph.index.search.backend=solr5
 atlas.graph.index.search.solr.mode=cloud
 atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
</verbatim>

82 83 84 85
---+++ Hive Lineage Configs
The higher layer services like hive lineage, schema, etc. are driven by the type system and this
section encodes the specific types for the hive data model.

86
# This models reflects the base super types for Data and Process
87
<verbatim>
88 89 90 91
atlas.lineage.hive.table.type.name=DataSet
atlas.lineage.hive.process.type.name=Process
atlas.lineage.hive.process.inputs.name=inputs
atlas.lineage.hive.process.outputs.name=outputs
92 93

## Schema
94
atlas.lineage.hive.table.schema.query=hive_table where name=?, columns
95 96
</verbatim>

97
---+++ Notification Configs
98
Refer http://kafka.apache.org/documentation.html#configuration for Kafka configuration. All Kafka configs should be prefixed with 'atlas.kafka.'
99 100 101 102 103 104 105 106 107

<verbatim>
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
108
atlas.kafka.hook.group.id=atlas
109 110
</verbatim>

111 112 113 114 115 116 117
Note that Kafka group ids are specified for a specific topic.  The Kafka group id configuration for entity notifications is 'atlas.kafka.entities.group.id'

<verbatim>
atlas.kafka.entities.group.id=<consumer id>
</verbatim>


118 119 120 121 122 123 124
---+++ Client Configs
<verbatim>
atlas.client.readTimeoutMSecs=60000
atlas.client.connectTimeoutMSecs=60000
</verbatim>


125 126 127 128 129 130
---+++ Security Properties

---++++ SSL config
The following property is used to toggle the SSL feature.

<verbatim>
131
atlas.enableTLS=false
132 133
</verbatim>