index.twiki 3.22 KB
Newer Older
1
---+ Data Governance and Metadata framework for Hadoop
2 3


4
---++ Overview
5

6
Atlas is a scalable and extensible set of core foundational governance services – enabling
7 8
enterprises to effectively and efficiently meet their compliance requirements within Hadoop and
allows integration with the whole enterprise data ecosystem.
9

10 11 12 13
Apache Atlas provides open metadata management and governance capabilities for organizations
to build a catalog of their data assets, classify and govern these assets and provide collaboration
capabilities around these data assets for data scientists, analysts and the data governance team.

14
---++ Features
15

16 17 18 19 20 21 22 23 24 25 26
---+++ Metadata types & instances
   * Pre-defined types for various Hadoop and non-Hadoop metadata
   * Ability to define new types for the metadata to be managed
   * Types can have primitive attributes, complex attributes, object references; can inherit from other types
   * Instances of types, called entities, capture metadata object details and their relationships
   * REST APIs to work with types and instances allow easier integration

---+++ Classification
   * Ability to dynamically create classifications - like PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE
   * Classifications can include attributes - like expiry_date attribute in EXPIRES_ON classification
   * Entities can be associated with multiple classifications, enabling easier discovery and security enforcement
27

28 29 30
---+++ Lineage
   * Intuitive UI to view lineage of data as it moves through various processes
   * REST APIs to access and update lineage
31

32 33 34 35
---+++ Search/Discovery
   * Intuitive UI to search entities by type, classification, attribute value or free-text
   * Rich REST APIs to search by complex criteria
   * SQL like query language to search entities - Domain Specific Language (DSL)
36

37 38 39 40
---+++ Security & Data Masking
   * Integration with Apache Ranger enables authorization/data-masking based on classifications associated with entities in Apache Atlas. For example:
      * who can access data classified as PII, SENSITIVE
      * customer-service users can only see last 4 digits of columns classified as NATIONAL_ID
41

42

43
---++ Getting Started
44

45 46
   * [[InstallationSteps][Build & Install]]
   * [[QuickStart][Quick Start]]
47

48

49 50
---++ Documentation

51 52
   * [[Architecture][High Level Architecture]]
   * [[TypeSystem][Type System]]
53 54
   * [[Search - Basic][Basic Search]]
   * [[Search - Advanced][Advanced Search]]
55
   * [[security][Security]]
56
   * [[Authentication-Authorization][Authentication and Authorization]]
57
   * [[Configuration][Configuration]]
58 59
   * Notification
      * [[Notification-Entity][Entity Notification]]
60 61
   * Bridges
      * [[Bridge-Hive][Hive Bridge]]
62 63
      * [[Bridge-Sqoop][Sqoop Bridge]]
      * [[Bridge-Falcon][Falcon Bridge]]
64
      * [[StormAtlasHook][Storm Bridge]]
65
   * [[HighAvailability][Fault Tolerance And High Availability Options]]
66

67 68
---++ API Documentation

69
   * <a href="api/v2/index.html">REST API Documentation</a>
70
   * [[Import-Export-API][Export & Import REST API Documentation]]
71
   * <a href="../api/rest.html">Legacy API Documentation</a>
72

73 74 75
---++ Developer Setup Documentation
   * [[EclipseSetup][Developer Setup: Eclipse]]

76 77
#LicenseInfo
---+ Licensing Information
78

79
Atlas is distributed under [[http://www.apache.org/licenses/][Apache License 2.0]].