Commit a2dc0ba8 by Ashutosh Mestry

ATLAS-2839: Export-Import New Features Documentation.

parent ba2b1449
......@@ -83,6 +83,11 @@
</dependency>
<dependency>
<groupId>org.apache.maven.doxia</groupId>
<artifactId>doxia-module-markdown</artifactId>
<version>${doxia.version}</version>
</dependency>
<dependency>
<groupId>org.apache.maven.doxia</groupId>
<artifactId>doxia-core</artifactId>
<version>${doxia.version}</version>
</dependency>
......
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
# Atlas Server Entity Type
#### Background
The _AtlasServer_ entity type is special entity type in following ways:
* Gets created during Export or Import operation.
* It also has special property pages that display detailed audits for export and import operations.
* Entities are linked to it using the new option within entity's attribute _[SoftReference](SoftReference)_.
The new type is available within the _Search By Type_ dropdown in both _Basic_ and _Advanced_ search.
#### Creation
The entity of this type is created upon successful completion of every Export and Import operation. The entity is created with current cluster's name.
The entity is also created based on export and import requests' _replicatedTo_ and _replicatedFrom_ parameters.
#### Details within Property Page
The property page for _AtlasServer_ entity has one additional tab 'Export/Import Audits'. This has detailed audit record for each export and/or import operation performed on current Atlas instance.
The _additionalInfo_ attribute property is discussed in detail below.
<img src="images/markdown/atlas-server-properties.png" style="border:1px solid; margin-left:25px"/>
###### Export/Import Audits
The table has following columns:
* _Operation_: EXPORT or IMPORT that denotes the operation performed on instance.
* _Source Server_: For an export operation performed on this instance, the value in this column will always be the cluster name of the current Atlas instance. This is the value specified in _atlas-application.properties_ by the key _atlas.cluster.name_. If not value is specified 'default' is used.
* _Target Server_: If an export operation is performed with _replicatedTo_ property specified in the request, that value appears here.
* _Operation StartTime_: Time the operation was started.
* _Operation EndTIme_: Time the operation completed.
* _Tools_: Pop-up property page that contains details of the operation.
<img src="images/markdown/atlas-server-exp-imp-audits.png" style="border:1px solid; margin-left:25px"/>
###### Example
The following export request will end up creating _AtlasServer_ entity with _clMain_ as its name. The audit record of this operation will be displayed within the property page of this entity.
```json
{
"itemsToExport": [
{ "typeName": "hive_db", "uniqueAttributes": { "qualifiedName": "stocks@cl1" }}
],
"options": {
"replicatedTo": "clMain"
}
}
```
#### Support for Cluster's Full Name
Often times it is necessary to disambiguate the name of the cluster by specifying the location or the data center within which the Atlas instance resides.
The name of the cluster can be specified by separating the location name and cluster name by '$'. For example, a clsuter name specified as 'SFO$cl1' can be a cluster in San Fancisco (SFO) data center with the name 'cl1'.
The _AtlasServer_ will handle this and set its name as 'cl1' and _fullName_ as 'SFO@cl1'.
#### Additional Information
This property in _AtlasServer_ is a map with key and value both as String. This can be used to store any information pertaining to this instance.
Please see [Incremental Export](IncrementalExport) for and example of how this property can be used.
#### REST APIs
Title |Atlas Server API |
----------------|---------------------------------------|
Example | see below. |
URL | api/atlas/admin/server/{serverName} |
Method | GET |
URL Parameters | name of the server |
Data Parameters | None |
Success Response| _AtlasServer_ |
Error Response | Errors Returned as AtlasBaseException |
###### CURL
```
curl -X GET -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" http://localhost:21000/api/atlas/admin/server/cl2
```
Output:
```json
{
"guid": "f87e4fd1-bfb5-482d-9ab1-e735621b7d16",
"name": "cl2",
"qualifiedName": "cl2",
"additionalInfo": {
"nextModifiedTimestamp": "1533037289383",
"replicationOperation": "EXPORT",
"topLevelEntity": "stocks@cl1"
}
}
```
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
# Export & Import Audits
#### Background
The new audits for Export and Import operations also have corresponding REST APIs to programatically fetch the audit entries.
#### REST APIs
|Title | Replication Audits for a Cluster |
|----------------|------------------------------------------------------------------|
|Example | See below. |
|URL | api/atlas/admin/expimp/audit |
|Method | GET |
|URL Parameters | _sourceClusterName_: Name of source cluster. |
| | _targetClusterName_: Name of target cluster. |
| | _userName_: Name of the user who initiated the operation. |
| | _operation_: EXPORT or IMPORT operation type. |
| | _startTime_: Time, in milliseconds, when operation was started. |
| | _endTime_: Time, in milliseconds, when operation ended. |
| | _limit_: Number of results to be returned |
| | _offset_: Offset |
|Data Parameters | None |
|Success Response| List of _ExportImportAuditEntry_ |
|Error Response | Errors Returned as AtlasBaseException |
|Notes | None |
###### CURL
curl -X GET -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" 'http://localhost:21000/api/atlas/admin/expimp/audit?sourceClusterName=cl2'
```json
{
"queryType": "BASIC",
"searchParameters": {
"typeName": "ReplicationAuditEntry",
"excludeDeletedEntities": false,
"includeClassificationAttributes": false,
"includeSubTypes": true,
"includeSubClassifications": true,
"limit": 100,
"offset": 0,
"entityFilters": {
"attributeName": "name",
"operator": "eq",
"attributeValue": "cl2",
"criterion": []
}
},
"entities": [{
"typeName": "ReplicationAuditEntry",
"attributes": {
"owner": null,
"uniqueName": "cl2:EXPORT:1533037289411",
"createTime": null,
"name": "cl2",
"description": null
},
"guid": "04844141-af72-498a-9d26-f70f91e8adf8",
"status": "ACTIVE",
"displayText": "cl2",
"classificationNames": []
}, {
"typeName": "ReplicationAuditEntry",
"attributes": {
"owner": null,
"uniqueName": "cl2:EXPORT:1533037368407",
"createTime": null,
"name": "cl2",
"description": null
},
"guid": "837abe66-20c8-4296-8678-e715498bf8fb",
"status": "ACTIVE",
"displayText": "cl2",
"classificationNames": []
}]
}
```
# (New) Entity Transforms Framework
#### Background
During Import Process, entity transforms are required to make changes to the entity before it gets committed to the database. These modifications are necessary to make the entity conform to the environment it is going to reside. The Import Process provided a mechanism to do that.
#### Transformation Framework
A transformation framework allows a mechanism to selectively transform an entity or specific attributes of that entity.
To achieve this, the framework, provides:
* Way to set a condition that needs to be satisfied for a transformation to be applied.
* Action to be taken on the entity once the condition is met.
The existing transformation frameworks allowed this to happen.
#### Reason for New Transformation Framework
While the existing framework provided the basic benefits of transformation framework, it did not have support for some of the commonly used Atlas types. Which meant that users of this framework would have to meticulously define transformations for every type they are working with. This can be tedious and potentially error prone.
The new framework addresses this problem by providing built-in transformations for some of the commonly used types. It can also be extended to accommodate new types.
#### Approach
The approach used by the new transformation framework creates a transformation by:
* Specifying a condition.
* Specifying action(s) to be taken if condition is met.
##### Conditions
Following are built-in conditions.
Condition Types | Description |
-----------------------------------------|-----------------|
ENTITY_ALL | Any/every entity |
ENTITY_TOP_LEVEL | Entity that is the top-level entity. This is also the entity present specified in _AtlasExportRequest_.|
EQUALS | Entity attribute equals to the one specified in the condition. |
EQUALS_IGNORE_CASE | Entity attribute equals to the one specified in the condition ignoring case. |
STARTS_WITH | Entity attribute starts with. |
STARTS_WITH_IGNORE_CASE | Entity attribute starts with ignoring case. |
HAS_VALUE | Entity attribute has value. |
##### Actions
Action Type | Description |
-------------------|----------------------------------------------|
ADD_CLASSIFICATION | Add classifiction |
REPLACE_PREFIX | Replace value starting with another value. |
TO_LOWER | Convert value of an attribute to lower case. |
SET | Set the value of an attribute |
CLEAR | Clear value of an attribute |
#### Built-in Transforms
###### Add Classification
During import, hive_db entity whose _qualifiedName_ is _stocks@cl1_ will get the classification _clSrcImported_.
```json
{
"conditions": {
"hive_db.qualifiedName": "stocks@cl1"
},
"action": {
"__entity": "ADD_CLASSIFICATION: clSrcImported"
}
}
```
Every imported entity will get the classification by simply changing the condition. The __entity is special condition which matches entity.
```json
{
"conditions": {
"__entity": ""
},
"action": {
"__entity": "ADD_CLASSIFICATION: clSrcImported"
}
}
```
To add classification to only the top-level entity (entity that is used as starting point for an export), use:
```json
{
"conditions": {
"__entity": "topLevel:"
},
"action": {
"__entity": "ADD_CLASSIFICATION: clSrcImported"
}
}
```
###### Replace Prefix
This action works on string values. The first parameter is the prefix that is searched for a match, once matched, it is replaced with the provided replacement string.
The sample below searches for _/aa/bb/_, once found replaces it with _/xx/yy/_.
```json
{
"conditions": {
"hdfs_path.clusterName": "EQUALS: CL1"
},
"action": {
"hdfs_path.path": "REPLACE_PREFIX: = :/aa/bb/=/xx/yy/"
}
}
```
###### To Lower
Entity whose hdfs_path.clusterName is CL1 will get its path attribute converted to lower case.
```json
{
"conditions": {
"hdfs_path.clusterName": "EQUALS: CL1"
},
"action": {
"hdfs_path.path": "TO_LOWER:"
}
}
```
###### Clear
Entity whose hdfs_path.clusterName has value set, will get its _replicatedTo_ attribute value cleared.
```json
{
"conditions": {
"hdfs_path.clusterName": "HAS_VALUE:"
},
"action": {
"hdfs_path.replicatedTo": "CLEAR:"
}
}
```
#### Additional Examples
Please look at [these tests](https://github.com/apache/atlas/blob/master/intg/src/test/java/org/apache/atlas/entitytransform/TransformationHandlerTest.java) for examples using Java classes.
\ No newline at end of file
## Incremental Export
#### Background
Incremental export allows for export of entities after a specified timestamp. This allows for synchronization capability to be built as it makes payloads lighter.
#### Export Options
New _fetchType_ added to indicate incremental export. This option can be used with any _matchType_. When _fetchType_ is _incremental_, it is necessary to specify the _changeMarker_ option for incremental export to function, else full export will be performed.
```json
{
"itemsToExport": [
{ "typeName": "hive_db", "uniqueAttributes": { "qualifiedName": "stocks@cl1" } }
],
"options": {
"fetchType": "incremental",
"changeMarker": 10000
}
}
```
#### Getting Change Marker
The very first call to export with _fetchType_ set to _incremental_ should be made with _changeMarker_ set to 0. This will perform a full export. The _AtlasExportResult_ will have the _changeMarker_ set to a value. This is the value that should be used for subsequent call to Export.
#### Skip Lineage Option
Export can be performed by skipping lineage information. This avoids all lineage information getting into the exported file.
#### Benefit of Incremental Export
The real benefit of incremental export comes when export is done with _skipLineage_ option set to _true_. This greatly improves performance when fetching entities that have changed since the last export.
# Replicated Attributes
#### Background
Users want knowledge of how the entities have landed into Atlas instance, if they were created via hook ingestion or whether they were imported from another Atlas instance.
This is addressed by 2 new attributes that are now part of _Referenceable_ entity type viz. _replicatedFrom_ and _replicatedTo_.
# Entity Attribute Option: SoftReference
#### Background
Entity attributes are specified using attribute definitions. An attributes persistence strategy is determined by based on their type.
Primitive types are persisted as properties within the vertex of their parent.
Non-primitive attributes get a vertex of their own and and edge is created between the parent the child to establish ownership.
Attribute with _isSoftReference_ option set to _true_, is non-primitive attribute that gets treatment of a primitive attribute.
#### Specification
Below is an example of using the new attribute option.
```json
"attributeDefs": [
{
"name": "replicatedFrom",
"typeName": "array<AtlasServer>",
"cardinality": "SET",
"isIndexable": false,
"isOptional": true,
"isUnique": false,
"options": {
"isSoftReference": "true"
}
},
```
......@@ -36,6 +36,7 @@ Current implementation has 2 options. Both are optional:
* _fetchType_ This option configures the approach used for fetching entities. It has following values:
* _FULL_: This fetches all the entities that are connected directly and indirectly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table, database and all the other tables within the database.
* _CONNECTED_: This fetches all the etnties that are connected directly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table and the database entity only.
* _INCREMENTAL_: See [[Incremental-Export][here]] for details.
If no _matchType_ is specified, exact match is used. Which means, that the entire string is used in the search criteria.
......
......@@ -10,7 +10,6 @@ Also, HDFS paths tend to be hierarchical, in the sense that users tend to model
__Sample HDFS Setup__
<table border="1" cellpadding="pixels" cellspacing="pixels">
<tr>
<th><strong>HDFS Path</strong></th> <th><strong>Atlas Entity</strong></th>
......@@ -77,3 +76,6 @@ curl -X POST -u adminuser:password -H "Content-Type: application/json" -H "Cache
}
}' "http://localhost:21000/api/atlas/admin/export" > financeAll.zip
</verbatim>
---+++ Automatic Creation of HDFS entities
Given that HDFS entity creation is a manual process. The Export API offers a mechanism for creation of requested HDFS entities.
---+ Export & Import REST APIs
---+++ What's New
The release of 0.8.3 includes the following improvements to Export and Import APIs:
* Export: Support for _[[Incremental-Export][Incremental Export]]_.
* Export & Import: Support for [[ReplicatedToFromAttributes][replicated attributes]] to entities made possible by _[[SoftReference][SoftReference]]_ entity attribute option.
* Export option: _[[Incremental-Export][skipLineage]]_.
* New entity transforms framework.
* New _[[AtlasServer][AtlasServer]]_ entity type.
* Export: [[Export-HDFS-API][Automatic creation of HDFS path]] requested entities.
* New [[ExportImportAudits][audits]] for Export & Import operations.
---+++ Background
The Import-Export APIs for Atlas facilitate transfer of data to and from a cluster that has Atlas provisioned.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment