InstallationSteps.twiki 9.17 KB
Newer Older
1 2 3 4 5 6 7 8 9
---++ Building & Installing Apache Atlas

---+++ Building Atlas

<verbatim>
git clone https://git-wip-us.apache.org/repos/asf/incubator-atlas.git atlas

cd atlas

10
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m" && mvn clean install
11 12 13 14 15 16
</verbatim>

Once the build successfully completes, artifacts can be packaged for deployment.

<verbatim>

17
mvn clean package -Pdist
18 19 20

</verbatim>

21
Tar can be found in atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz
22 23 24 25 26 27 28 29 30 31 32 33

Tar is structured as follows

<verbatim>

|- bin
   |- atlas_start.py
   |- atlas_stop.py
   |- atlas_config.py
   |- quick_start.py
   |- cputil.py
|- conf
34
   |- atlas-application.properties
35 36
   |- atlas-env.sh
   |- log4j.xml
37 38 39 40 41 42 43 44 45
   |- solr
      |- currency.xml
      |- lang
         |- stopwords_en.txt
      |- protowords.txt
      |- schema.xml
      |- solrconfig.xml
      |- stopwords.txt
      |- synonyms.txt
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
|- docs
|- server
   |- webapp
      |- atlas.war
|- README
|- NOTICE.txt
|- LICENSE.txt
|- DISCLAIMER.txt
|- CHANGES.txt

</verbatim>

---+++ Installing & Running Atlas

*Installing Atlas*
<verbatim>
tar -xzvf apache-atlas-${project.version}-bin.tar.gz
63 64

cd atlas-${project.version}
65 66 67 68 69
</verbatim>

*Configuring Atlas*

By default config directory used by Atlas is {package dir}/conf. To override this set environment
70
variable ATLAS_CONF to the path of the conf dir.
71 72 73 74 75 76 77 78 79 80 81

atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment
variables that you need for you services. In addition you can set any other environment
variables you might need. This file will be sourced by atlas scripts before any commands are
executed. The following environment variables are available to set.

<verbatim>
# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
#export JAVA_HOME=

# any additional java opts you want to set. This will apply to both client and server operations
82
#export ATLAS_OPTS=
83 84

# any additional java opts that you want to set for client only
85
#export ATLAS_CLIENT_OPTS=
86 87

# java heap size we want to set for the client. Default is 1024MB
88
#export ATLAS_CLIENT_HEAP=
89 90

# any additional opts you want to set for atlas service.
91
#export ATLAS_SERVER_OPTS=
92 93

# java heap size we want to set for the atlas server. Default is 1024MB
94
#export ATLAS_SERVER_HEAP=
95 96

# What is is considered as atlas home dir. Default is the base locaion of the installed software
97
#export ATLAS_HOME_DIR=
98 99

# Where log files are stored. Defatult is logs directory under the base install location
100
#export ATLAS_LOG_DIR=
101 102

# Where pid files are stored. Defatult is logs directory under the base install location
103
#export ATLAS_PID_DIR=
104 105

# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
106
#export ATLAS_DATA_DIR=
107 108

# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
109
#export ATLAS_EXPANDED_WEBAPP_DIR=
110 111 112 113
</verbatim>


*NOTE for Mac OS users*
114
If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS (explained above).
115 116

In  {package dir}/conf/atlas-env.sh uncomment the following line
117
<verbatim>
118
#export ATLAS_SERVER_OPTS=
119
</verbatim>
120 121

and change it to look as below
122
<verbatim>
123
export ATLAS_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
124 125
</verbatim>

126
*Hbase as the Storage Backend for the Graph Repository*
127 128

By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
129
The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please see "Graph persistence engine - HBase" in the [[Configuration][Configuration]] section
130 131 132
for more details.

Pre-requisites for running HBase as a distributed cluster
133 134
   * 3 or 5 !ZooKeeper nodes
   * Atleast 3 !RegionServer nodes. It would be ideal to run the !DataNodes on the same hosts as the Region servers for data locality.
135

136
*Configuring SOLR as the Indexing Backend for the Graph Repository*
137 138 139 140

By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
For configuring Titan to work with Solr, please follow the instructions below

141 142 143
   * Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz

   * Start solr in cloud mode.
144 145
  !SolrCloud mode uses a !ZooKeeper Service as a highly available, central location for cluster management.
  For a small cluster, running with an existing !ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple !ZooKeeper quorum with atleast 3 servers.
146 147
  Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud

148
   * For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
149 150 151 152
      <verbatim>
      $SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983
      </verbatim>

153
   * Run the following commands from SOLR_HOME directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts,
154 155 156
  first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files
  have been copied to on Solr host:

157
<verbatim>
158 159 160
  bin/solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
  bin/solr create -c edge_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
  bin/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
161
</verbatim>
162 163 164

  Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
  Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
165
  The number of shards cannot exceed the total number of Solr nodes in your !SolrCloud cluster.
166 167

  The number of replicas (replicationFactor) can be set according to the redundancy required.
168

169
   * Change ATLAS configuration to point to the Solr instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME/conf/atlas-application.properties
170
<verbatim>
171
 atlas.graph.index.search.backend=solr5
172 173 174 175
 atlas.graph.index.search.solr.mode=cloud
 atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
</verbatim>

176 177
   * Restart Atlas

178 179
For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm

180 181 182 183
Pre-requisites for running Solr in cloud mode
  * Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
    Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
  * Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
184 185
  * !SolrCloud has support for replication and sharding. It is highly recommended to use !SolrCloud with at least two Solr nodes running on different servers with replication enabled.
    If using !SolrCloud, then you also need !ZooKeeper installed and configured with 3 or 5 !ZooKeeper nodes
186

187 188 189 190 191 192
*Starting Atlas Server*
<verbatim>
bin/atlas_start.py [-port <port>]
</verbatim>

By default,
193
   * To change the port, use -port option.
194
   * atlas server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple atlas upgrades), set environment variable ATLAS_CONF to the path of conf dir
195 196

*Using Atlas*
197
   * Quick start model - sample model and data
198
<verbatim>
199
  bin/quick_start.py [<atlas endpoint>]
200
</verbatim>
201

202 203
   * Verify if the server is up and running
<verbatim>
204 205
  curl -v http://localhost:21000/api/atlas/admin/version
  {"Version":"v0.1"}
206
</verbatim>
207

208 209
   * List the types in the repository
<verbatim>
210
  curl -v http://localhost:21000/api/atlas/types
211
  {"results":["Process","Infrastructure","DataSet"],"count":3,"requestId":"1867493731@qtp-262860041-0 - 82d43a27-7c34-4573-85d1-a01525705091"}
212
</verbatim>
213

214 215
   * List the instances for a given type
<verbatim>
216 217 218 219
  curl -v http://localhost:21000/api/atlas/entities?type=hive_table
  {"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-45a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}

  curl -v http://localhost:21000/api/atlas/entities/list/hive_db
220
</verbatim>
221

222 223
   * Search for entities (instances) in the repository
<verbatim>
224 225 226 227 228 229
  curl -v http://localhost:21000/api/atlas/discovery/search/dsl?query="from hive_table"
</verbatim>


*Dashboard*

230
Once atlas is started, you can view the status of atlas entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.
231 232 233 234 235 236


*Stopping Atlas Server*
<verbatim>
bin/atlas_stop.py
</verbatim>