ImportAPIOptions.md 6.81 KB
Newer Older
1 2
---
name: Import API Options
3
route: /ImportAPIOptions
4 5 6 7 8 9 10 11 12
menu: Documentation
submenu: Import/Export
---

import  themen  from 'theme/styles/styled-colors';
import  * as theme  from 'react-syntax-highlighter/dist/esm/styles/hljs';
import SyntaxHighlighter from 'react-syntax-highlighter';

# Import API Options
13 14 15

Import API options are specified as _options_ JSON. Since the API accepts multi-part form data, it is possible to sepecify multipls input streams within the CURL call.

16 17 18
### Examples Using CURL Calls
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`curl -g -X POST -u adminuser:password -H "Content-Type: multipart/form-data"
19 20 21
            -H "Cache-Control: no-cache"
            -F request=@importOptions.json
            -F data=@quickStartDB.zip
22 23
            "http://localhost:21000/api/atlas/admin/import"`}
</SyntaxHighlighter>
24 25

To use the defaults, set the contents of _importOptions.json_ to:
26 27 28

<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
29 30
  "options": {
  }
31 32
}`}
</SyntaxHighlighter>
33 34


35
### Options
36 37 38 39 40
Following options are supported for Import process:

   * Specify transforms during import operation.
   * Resume import by specifying starting entity guid.
   * Optionally import type definition.
41
   * Handling large imports.
42

43
#### Transforms
44 45 46 47 48 49 50 51 52 53 54 55 56 57

During the import process, the attribute value of the incoming entity can be changed.

This is possible by specifying entity type and at attribute to be modified and then the manner in which it needs to be modified.

Right now these are the transforms that can be applied:
   * _lowercase_ Converts the attribute value to lower case.
   * _replace_ This performs a string find and replace operation. It takes two parameters, the first is the string to search for and the next one is the string to replace it with.

Example:

The example below applies couple of transforms to the the _qualifiedName_ attribute of hive_table. It converts the value to lower case, then searches for 'cl1', if found, replaces it with 'cl2'.

To use the option, set the contents of _importOptions.json_ to:
58 59 60

<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
61
  "options": {
62
    "transforms": {"hive_table": { "qualifiedName": [ replace:@cl1:@cl2 ] }, "hive_db": { "qualifiedName": [ replace:@cl1:@cl2 ] } }
63
  }
64 65
}`}
</SyntaxHighlighter>
66

67
Please refer to [ATLAS-1825](https://issues.apache.org/jira/browse/ATLAS-1825) for details scenarios when this option could be used.
68

69
#### Start Guid or Start Index
70 71 72 73 74 75 76

When an import operation is in progress and the server goes down, it would be possible to resume import from the last successfully imported entity. This would allow the import to resume from where it left off.

Server-side logging is improved to display the detail of the last successfully imported entity, this includes the index within the import list and the entity's guid. Either can be used specify the point to resume import.

To use the option, set the contents of _importOptions.json_ to:

77 78
<SyntaxHighlighter wrapLines={true} language="shell" style={theme.dark}>
{`{
79 80 81
  "options": {
    "startGuid": "bd97c78e-3fa5-4f9c-9f48-3683ca3d1fb1"
  }
82 83
}`}
</SyntaxHighlighter>
84

85
To use _startPosition_, use the following in the _importOptions.json_:
86

87 88
<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
89
  "options": {
90
    "startPosition": "332"
91
  }
92 93
}`}
</SyntaxHighlighter>
94 95

Steps to use the behavior:
96
   * Start an import (using the CURL) that is fairly long, say about 1000# entities.
97 98 99 100 101
   * While the import is in progress, stop atlas server (using atlas_stop.py).
   * From the log file located at _/var/log/atlas/application.log_ get the last successfully imported entity GUID or index position.
   * Update the _importOptions.json_ with the guid.
   * Restart import.

102
#### Optional Importing Type Definition
103 104 105

The output of Export has _atlas-typedef.json_ that contains the type definitions for the entities exported.

106
By default (that is if no options is specified), the type definitions are imported and applied to the system being imported to. The entity import is performed after this.
107 108 109 110 111 112 113

In some cases, you would not want to modify the type definitions. Import may be better off failing than the types be modified.

This option allows for optionally importing of type definition. The option is set to _true_ by default, which means that type definition is imported. With this option set to _false_, type definitions preseneraent in the source will not be imported. In case of mismatch between the entities being imported the types present in the system where the import is being performed, the operation will fail.

Table below enumerates the conditions that get addressed as part of type definition import:

114
|**Condition**|**Action**|
115
|-------------|----------|
116 117
| Incoming type does not exist in target system | Type is created. |
|Type to be imported and type in target system are same | No change |
118
|Type to be imported and type in target system differ by some attributes| Target system type is updated to the attributes present in the source.<br /> It is possible that the target system will have attributes in addition to the one present in the source.<br /> In that case, the target system's type attributes will be an union of the attributes.<br /> Attributes in target system will not be deleted to match the source. <br />If the type of the attribute differ, import process will be aborted and exception logged.|
119 120 121

To use the option, set the contents of _importOptions.json_ to:

122 123
<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
124 125 126
  "options": {
    "updateTypeDefinition": true
  }
127 128
}`}
</SyntaxHighlighter>
129

130
#### Specifying File to be Imported From Server Location
131 132 133 134 135

In scenario where the file to be imported is present at a location on the server, the _importfile_ API can be used. It behaves like the Import API.

To use the option, set the contents of _importOptions.json_ to:

136 137
<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`{
138 139 140
  "options": {
    "fileName": "/root/fileToBeImported.zip"
  }
141 142
}`}
</SyntaxHighlighter>
143 144

_CURL_
145 146 147

<SyntaxHighlighter wrapLines={true} language="json" style={theme.dark}>
{`curl -g -X POST -u adminuser:password -H "Content-Type: application/json"
148 149
            -H "Cache-Control: no-cache"
            -d r@importOptions.json
150 151
            "http://localhost:21000/api/atlas/admin/importfile"`}
</SyntaxHighlighter>
152

153
#### Handling Large Imports
154 155 156 157 158 159 160 161

By default, the Import Service stores all of the data in memory. This may be limiting for ZIPs containing large amount of data.

To configure temporary directory use the application property _atlas.import.temp.directory_. If this property is left blank, default in-memory implementation is used.

Please ensure that there is sufficient disk space available for the operation.

The contents of the directory created as backing store for the import operation will be erased after the operation is over.