#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
HBASE_CONF_DIR environment variable needs to be set to point to the Hbase client configuration directory which is added to classpath when Atlas starts up.
hbase-site.xml needs to have the following properties set according to the cluster setup
<verbatim>
#Set below to /hbase-secure if the Hbase server is setup in secure mode
zookeeper.znode.parent=/hbase-unsecure
</verbatim>
Advanced configuration
# If you are planning to use any of the configs mentioned below, they need to be prefixed with "atlas.graph." to take effect in ATLAS
* Hbase as the Storage Backend for the Graph Repository
By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
The HBase versions currently supported are 0.98.x, 1.0.x, 1.1.x. For configuring ATLAS graph persistence on HBase, please go through the "Configuration - Graph persistence engine - HBase" section
for more details.
Pre-requisites for running HBase as a distributed cluster
* 3 or 5 ZooKeeper nodes
* Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.
* Configuring SOLR as the Indexing Backend for the Graph Repository
By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
...
...
@@ -152,6 +162,13 @@ For configuring Titan to work with Solr, please follow the instructions below
For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm
Pre-requisites for running Solr in cloud mode
* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
* Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
* SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.
If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes
logger.warn("The configuration property {} is ignored for HBase. Set hbase.zookeeper.property.clientPort in hbase-site.xml or {}.hbase.zookeeper.property.clientPort in Titan's configuration file.",
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2