Commit 3ed1f5a0 by nikhilbonte Committed by nixonrodrigues

ATLAS-3057:- Atlas Index Repair tool for JanusGraph

parent 4925ec35
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
##Introduction
The document describes the use of the Atlas Index Repair Utility for JanusGraph, with HBase as back-end data store and Solr as index store.
####Need for this Tool
In rare, cases it is possible that during entity creation, the entity is stored in the data store, but the corresponding indexes are not created in Solr. Since Atlas relies heavily on Solr in the operation of its Basic Search, this will result in entity not being returned by a search. Note that Advanced Search is not affected by this.
####Location
The tool is part of the normal Atlas installation, it is located under the tools/atlas-index-repair directory.
####Steps to Execute Tool
#####Complete Restore
If the user needs to restore all the indexes, this can be accomplished by executing the tool with no command-line parameters:
>atlas-index-repair/repair_index.py
This will result in vertex_index, edge_index and fulltext_index to be re-built completely. It is recommended that existing contents of these indexes be deleted before executing this restore.
######Caveats
Note that the full index repair is a time consuming process. Depending on the size of data the process may take days to complete. During the restore process the Basic Search functionality will not be available. Be sure to allocate sufficient time for this activity.
#####Selective Restore
To perform selective restore for an Atlas entity, specify the GUID of that entity:
>atlas-index-repair/repair_index.py [-g \<guid>]
Example:
> atlas-index-repair/repair_index.py -g 13d77457-2a45-4e92-ad53-a172c7cb70a5
Note that Atlas will use REST APIs to fetch the entity, which will need correct authentication mechanism to be specified based on the installation.
For an Atlas installation with username and password use:
>atlas-index-repair/repair_index.py [-g \<guid>] [-u \<user>] [-p \<password>]
* guid: [optional] specify guid for which indexes are to be updated
* user: [optional] specify username for atlas instance
* password: [optional] specify password for atlas instance
Example:
>atlas-index-repair/repair_index.py -u admin -p admin123 -g 13d77457-2a45-4e92-ad53-a172c7cb70a5
For Atlas installation that uses kerberos as authentication mode,
use: kinit -kt /etc/security/keytabs/atlas.service.keytab atlas/fqdn@DOMAIN
Example:
>kinit -kt /etc/security/keytabs/atlas.service.keytab atlas/fqdn@EXAMPLE.com
>
>atlas-index-repair/repair_index.py -g 13d77457-2a45-4e92-ad53-a172c7cb70a5
...@@ -72,6 +72,7 @@ capabilities around these data assets for data scientists, analysts and the data ...@@ -72,6 +72,7 @@ capabilities around these data assets for data scientists, analysts and the data
* [[Bridge-Kafka][Kafka Bridge]] * [[Bridge-Kafka][Kafka Bridge]]
* [[HighAvailability][Fault Tolerance And High Availability Options]] * [[HighAvailability][Fault Tolerance And High Availability Options]]
* [[Migration-0.8-to-1.0][Migration from Apache Atlas 0.8]] * [[Migration-0.8-to-1.0][Migration from Apache Atlas 0.8]]
* [[AtlasRepairIndex][Index repair tool]]
---++ API Documentation ---++ API Documentation
......
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>apache-atlas</artifactId>
<groupId>org.apache.atlas</groupId>
<version>2.0.0-SNAPSHOT</version>
<relativePath>../../</relativePath>
</parent>
<artifactId>atlas-index-repair-tool</artifactId>
<description>Apache Atlas index repair Module</description>
<name>Apache Atlas index repair tool</name>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache.atlas</groupId>
<artifactId>atlas-client-v2</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.atlas</groupId>
<artifactId>atlas-graphdb-janus</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.atlas</groupId>
<artifactId>atlas-repository</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-core</artifactId>
<version>${janus.version}</version>
</dependency>
</dependencies>
</project>
<?xml version="1.0" encoding="UTF-8" ?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<appender name="FILE" class="org.apache.log4j.RollingFileAppender">
<param name="File" value="/var/log/atlas/atlas-index-janus-repair.log"/>
<param name="Append" value="true"/>
<param name="maxFileSize" value="100MB" />
<param name="maxBackupIndex" value="20" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d %-5p - [%t:%x] ~ %m (%C{1}:%L)%n"/>
</layout>
</appender>
<logger name="org.apache.atlas.tools.RepairIndex" additivity="false">
<level value="info"/>
<appender-ref ref="FILE"/>
</logger>
<root>
<priority value="warn"/>
<appender-ref ref="FILE"/>
</root>
</log4j:configuration>
#!/usr/bin/env python
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
sys.path.insert(0, '/usr/hdp/current/atlas-server/bin/')
import traceback
import subprocess
import atlas_config as mc
ATLAS_LOG_FILE="atlas-index-janus-repair.log"
ATLAS_LOG_OPTS="-Datlas.log.dir=%s -Datlas.log.file="+ATLAS_LOG_FILE
ATLAS_COMMAND_OPTS="-Datlas.home=%s"
ATLAS_CONFIG_OPTS="-Datlas.conf=%s"
DEFAULT_JVM_HEAP_OPTS="-Xmx4096m -XX:MaxPermSize=512m"
DEFAULT_JVM_OPTS="-Dlog4j.configuration=atlas-log4j.xml -Djava.net.preferIPv4Stack=true -server"
def main():
atlas_home = mc.atlasDir()
confdir = mc.dirMustExist(mc.confDir(atlas_home))
mc.executeEnvSh(confdir)
logdir = mc.dirMustExist(mc.logDir(atlas_home))
mc.dirMustExist(mc.dataDir(atlas_home))
if mc.isCygwin():
# Pathnames that are passed to JVM must be converted to Windows format.
jvm_atlas_home = mc.convertCygwinPath(atlas_home)
jvm_confdir = mc.convertCygwinPath(confdir)
jvm_logdir = mc.convertCygwinPath(logdir)
else:
jvm_atlas_home = atlas_home
jvm_confdir = confdir
jvm_logdir = logdir
print ("Logging: "+ os.path.join(jvm_logdir, ATLAS_LOG_FILE))
#create sys property for conf dirs
jvm_opts_list = (ATLAS_LOG_OPTS % (jvm_logdir)).split()
cmd_opts = (ATLAS_COMMAND_OPTS % jvm_atlas_home)
jvm_opts_list.extend(cmd_opts.split())
config_opts = (ATLAS_CONFIG_OPTS % jvm_confdir)
jvm_opts_list.extend(config_opts.split())
atlas_server_heap_opts = os.environ.get(mc.ATLAS_SERVER_HEAP, DEFAULT_JVM_HEAP_OPTS)
jvm_opts_list.extend(atlas_server_heap_opts.split())
atlas_server_jvm_opts = os.environ.get(mc.ATLAS_SERVER_OPTS)
if atlas_server_jvm_opts:
jvm_opts_list.extend(atlas_server_jvm_opts.split())
atlas_jvm_opts = os.environ.get(mc.ATLAS_OPTS, DEFAULT_JVM_OPTS)
jvm_opts_list.extend(atlas_jvm_opts.split())
#expand web app dir
web_app_dir = mc.webAppDir(atlas_home)
mc.expandWebApp(atlas_home)
p = os.pathsep
atlas_classpath = os.path.join(os.getcwd(), ".", "*") + p \
+ confdir + p \
+ os.path.join(web_app_dir, "atlas", "WEB-INF", "classes" ) + p \
+ os.path.join(web_app_dir, "atlas", "WEB-INF", "lib", "*" ) + p \
+ os.path.join(atlas_home, "libext", "*")
is_hbase = mc.is_hbase(confdir)
if is_hbase:
#add hbase-site.xml to classpath
hbase_conf_dir = mc.hbaseConfDir(atlas_home)
if os.path.exists(hbase_conf_dir):
atlas_classpath = atlas_classpath + p \
+ hbase_conf_dir
else:
if mc.is_hbase(confdir):
raise Exception("Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir", hbase_conf_dir)
if mc.isCygwin():
atlas_classpath = mc.convertCygwinPath(atlas_classpath, True)
atlas_pid_file = mc.pidFile(atlas_home)
if os.path.isfile(atlas_pid_file):
#Check if process listed in atlas.pid file is still running
pf = file(atlas_pid_file, 'r')
pid = pf.read().strip()
pf.close()
if is_hbase and mc.is_hbase_local(confdir):
print "configured for local hbase."
mc.configure_hbase(atlas_home)
mc.run_hbase_action(mc.hbaseBinDir(atlas_home), "start", hbase_conf_dir, logdir)
print "hbase started."
web_app_path = os.path.join(web_app_dir, "atlas")
if (mc.isCygwin()):
web_app_path = mc.convertCygwinPath(web_app_path)
start_migration_export(atlas_classpath, atlas_pid_file, jvm_logdir, jvm_opts_list, web_app_path)
def start_migration_export(atlas_classpath, atlas_pid_file, jvm_logdir, jvm_opts_list, web_app_path):
args = []
args.extend(sys.argv[1:])
process = java("org.apache.atlas.tools.RepairIndex", args, atlas_classpath, jvm_opts_list)
def java(classname, args, classpath, jvm_opts_list):
java_home = os.environ.get("JAVA_HOME", None)
if java_home:
prg = os.path.join(java_home, "bin", "java")
else:
prg = mc.which("java")
if prg is None:
raise EnvironmentError('The java binary could not be found in your path or JAVA_HOME')
commandline = [prg]
commandline.extend(jvm_opts_list)
commandline.append("-classpath")
commandline.append(classpath)
commandline.append(classname)
commandline.extend(args)
p = subprocess.Popen(commandline)
p.communicate()
if __name__ == '__main__':
try:
returncode = main()
except Exception as e:
print "Exception: %s " % str(e)
print traceback.format_exc()
returncode = -1
sys.exit(returncode)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment