Notification is used for reliable entity registration from hooks and for entity/type change notifications. Atlas, by default, provides Kafka integration, but its possible to provide other implementations as well. Atlas service starts embedded Kafka server by default.
Notification is used for reliable entity registration from hooks and for entity/type change notifications. Atlas, by default, provides Kafka integration, but its possible to provide other implementations as well. Atlas service starts embedded Kafka server by default.
Atlas also provides NotificationHookConsumer that runs in Atlas Service and listens to messages from hook and registers the entities in Atlas.
Atlas also provides !NotificationHookConsumer that runs in Atlas Service and listens to messages from hook and registers the entities in Atlas.
@@ -88,14 +88,14 @@ HBase on the other hand doesnt provide ACID guarantees but is able to scale for
...
@@ -88,14 +88,14 @@ HBase on the other hand doesnt provide ACID guarantees but is able to scale for
---+++ Choosing between Indexing Backends
---+++ Choosing between Indexing Backends
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html for chossing between ElasticSarch and Solr.
Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html for choosing between !ElasticSearch and Solr.
Solr in cloud mode is the recommended setup.
Solr in cloud mode is the recommended setup.
---+++ Switching Persistence Backend
---+++ Switching Persistence Backend
For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for "Graph Persistence Engine" described above and restart ATLAS.
For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for "Graph Persistence Engine" described above and restart ATLAS.
The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search.
The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search.
ElasticSearch runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory.
!ElasticSearch runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory.
For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes
For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes
@@ -85,5 +85,5 @@ to configure Atlas to use Kafka in HA mode, do the following:
...
@@ -85,5 +85,5 @@ to configure Atlas to use Kafka in HA mode, do the following:
---++ Known Issues
---++ Known Issues
* [[https://issues.apache.org/jira/browse/ATLAS-338][ATLAS-338]]: ATLAS-338: Metadata events generated from a Hive CLI (as opposed to Beeline or any client going HiveServer2) would be lost if Atlas server is down.
* [[https://issues.apache.org/jira/browse/ATLAS-338][ATLAS-338]]: ATLAS-338: Metadata events generated from a Hive CLI (as opposed to Beeline or any client going !HiveServer2) would be lost if Atlas server is down.
* If the HBase region servers hosting the Atlas ‘titan’ HTable are down, Atlas would not be able to store or retrieve metadata from HBase until they are brought back online.
* If the HBase region servers hosting the Atlas ‘titan’ HTable are down, Atlas would not be able to store or retrieve metadata from HBase until they are brought back online.
@@ -131,8 +131,8 @@ The HBase versions currently supported are 1.1.x. For configuring ATLAS graph pe
...
@@ -131,8 +131,8 @@ The HBase versions currently supported are 1.1.x. For configuring ATLAS graph pe
for more details.
for more details.
Pre-requisites for running HBase as a distributed cluster
Pre-requisites for running HBase as a distributed cluster
* 3 or 5 ZooKeeper nodes
* 3 or 5 !ZooKeeper nodes
* Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.
* Atleast 3 !RegionServer nodes. It would be ideal to run the !DataNodes on the same hosts as the Region servers for data locality.
*Configuring SOLR as the Indexing Backend for the Graph Repository*
*Configuring SOLR as the Indexing Backend for the Graph Repository*
...
@@ -142,8 +142,8 @@ For configuring Titan to work with Solr, please follow the instructions below
...
@@ -142,8 +142,8 @@ For configuring Titan to work with Solr, please follow the instructions below
* Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz
* Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz
* Start solr in cloud mode.
* Start solr in cloud mode.
SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management.
!SolrCloud mode uses a !ZooKeeper Service as a highly available, central location for cluster management.
For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with atleast 3 servers.
For a small cluster, running with an existing !ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple !ZooKeeper quorum with atleast 3 servers.
Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud
Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud
* For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
* For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
...
@@ -163,7 +163,7 @@ For configuring Titan to work with Solr, please follow the instructions below
...
@@ -163,7 +163,7 @@ For configuring Titan to work with Solr, please follow the instructions below
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.
The number of shards cannot exceed the total number of Solr nodes in your !SolrCloud cluster.
The number of replicas (replicationFactor) can be set according to the redundancy required.
The number of replicas (replicationFactor) can be set according to the redundancy required.
...
@@ -182,8 +182,8 @@ Pre-requisites for running Solr in cloud mode
...
@@ -182,8 +182,8 @@ Pre-requisites for running Solr in cloud mode
* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
* Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
* Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
* SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.
* !SolrCloud has support for replication and sharding. It is highly recommended to use !SolrCloud with at least two Solr nodes running on different servers with replication enabled.
If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes
If using !SolrCloud, then you also need !ZooKeeper installed and configured with 3 or 5 !ZooKeeper nodes
@@ -69,14 +69,14 @@ rep1sep => one or more, separated by second arg.
...
@@ -69,14 +69,14 @@ rep1sep => one or more, separated by second arg.
{noformat}
{noformat}
Language Notes:
Language Notes:
* A *SingleQuery* expression can be used to search for entities of a _Trait_ or _Class_.
* A *!SingleQuery* expression can be used to search for entities of a _Trait_ or _Class_.
Entities can be filtered based on a 'Where Clause' and Entity Attributes can be retrieved based on a 'Select Clause'.
Entities can be filtered based on a 'Where Clause' and Entity Attributes can be retrieved based on a 'Select Clause'.
* An Entity Graph can be traversed/joined by combining one or more SingleQueries.
* An Entity Graph can be traversed/joined by combining one or more !SingleQueries.
* An attempt is made to make the expressions look SQL like by accepting keywords "SELECT",
* An attempt is made to make the expressions look SQL like by accepting keywords "SELECT",
"FROM", and "WHERE"; but these are optional and users can simply think in terms of Entity Graph Traversals.
"FROM", and "WHERE"; but these are optional and users can simply think in terms of Entity Graph Traversals.
* The transitive closure of an Entity relationship can be expressed via the _Loop_ expression. A
* The transitive closure of an Entity relationship can be expressed via the _Loop_ expression. A
_Loop_ expression can be any traversal (recursively a query) that represents a _Path_ that ends in an Entity of the same _Type_ as the starting Entity.
_Loop_ expression can be any traversal (recursively a query) that represents a _Path_ that ends in an Entity of the same _Type_ as the starting Entity.
* The _WithPath_ clause can be used with transitive closure queries to retrieve the Path that
* The _!WithPath_ clause can be used with transitive closure queries to retrieve the Path that
connects the two related Entities. (We also provide a higher level interface for Closure Queries
connects the two related Entities. (We also provide a higher level interface for Closure Queries
see scaladoc for 'org.apache.atlas.query.ClosureQuery')
see scaladoc for 'org.apache.atlas.query.ClosureQuery')
* There are couple of Predicate functions different from SQL:
* There are couple of Predicate functions different from SQL:
...
@@ -90,7 +90,7 @@ Language Notes:
...
@@ -90,7 +90,7 @@ Language Notes:
* from DB
* from DB
* DB where name="Reporting" select name, owner
* DB where name="Reporting" select name, owner
* DB has name
* DB has name
* DB is JdbcAccess
* DB is !JdbcAccess
* Column where Column isa PII
* Column where Column isa PII
* Table where name="sales_fact", columns
* Table where name="sales_fact", columns
* Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment
* Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment