Commit 02448440 by Sarath Subramanian

ATLAS-2687: Create twiki documentation about classification propagation

parent f65aecee
---+ Classification Propagation
Classification propagation enables classifications associated to an entity to be automatically associated with other
related entities of the entity. This is very useful in dealing with scenarios where a dataset derives it data from other
datasets - like a table loaded with data in a file, a report generated from a table/view, etc. For example, when a table
is classified as *"PII"*, tables or views that derive data from this table (via CTAS or ‘create view’ operation) will be
automatically classified as *"PII"*.
---++ Use Cases
Consider the following lineage where data from a 'hdfs_path' entity is loaded into a table, which is further made
available through views. We will go through various scenarios to understand the classification propagation feature.
<img src="images/twiki/classification-propagation-1.png"/>
---++ Add classification to an entity
When classification ‘PII’ is added to 'hdfs_path' entity, the classification is propagated to all impacted entities in the
lineage path, including 'employees' table, views 'us_employees' and 'uk_employees' - as shown below.
<img src="images/twiki/classification-propagation-2.png"/>
---++ Update classification associated with an entity
Any updates to classifications associated with an entity will be seen in all entities the classification is
propagated to as well.
<img src="images/twiki/classification-propagation-3.png"/>
---++ Remove classification associated with an entity
When a classification is deleted from an entity, the classification will be removed from all entities the classification
is propagated to as well.
<img src="images/twiki/classification-propagation-4.png"/>
---++ Add lineage between entities
When lineage is added between entities, for example to capture loading of data in a file to a table, the classifications
associated with the source entity are propagated to all impacted entities as well.
For example, when a view is created from a table, classifications associated with the table are propagated to the newly
created view as well.
<img src="images/twiki/classification-propagation-5.png"/>
---++ Delete an entity
When an entity is deleted, classifications associated with this entity will be removed from all entities the
classifications are propagated to.
For example. when _employees_ table is deleted, classifications associated with this table are removed from
'employees_view' view.
<img src="images/twiki/classification-propagation-6.png"/>
---++ Control Propagation
Apache Atlas provides few options to control whether/where a classification is propagated.
This section details available options.
---++ Propagate flag in classification
Each association of classification to an entity has a boolean flag that controls whether the classification is
propagated or not. When a classification is associated with an entity, this flag is set to ‘true’ i.e. the classification
will be propagated to all impacted entities. This flag can be updated to desired value during initial association or later.
<img src="images/twiki/classification-propagation-7.png"/>
---++ Propagate flag in lineage edge
Apache Atlas supports a flag at lineage edge to enable/disable propagation of classifications through the edge. By default,
the propagation is enabled for lineage edges. When the flag is turned off, no classification will be propagated through
this edge; and propagation of currently propagated classifications through the edge will be reevaluated, so that they can
be removed from impacted entities. When the flag is turned on, propagation of classifications of the source entity will
be reevaluated, so that they can be propagated to all impacted entities.
---++ Block propagation of specific classifications in lineage edge
Apache Atlas supports blocking propagation of specific classifications in at lineage edges. This can be useful, for example,
to handle scenarios like: a column classified as PII is masked when creating a view; in such scenario, if corresponding
column in the created view might not have PII, hence the propagation of PII classification should be blocked. This can be
done by updating the lineage edge to add the PII classification in _‘blocked propagated classifications’_ list.
Classifications in blocked propagated classifications will not be propagated in the derivative/downstream entities.
<img src="images/twiki/classification-propagation-8.png"/>
---++ Notifications and Audit
When propagated classifications are added/update/deleted, Apache Atlas sends notifications to 'ATLAS_ENTITIES' topic for
each entity affected by the propagation.
---++ Glossary
When a classification is associated with a glossary term, the classification is automatically propagated to all entities
associated with the term.
\ No newline at end of file
......@@ -58,6 +58,7 @@ capabilities around these data assets for data scientists, analysts and the data
* [[Atlas-Authentication][Authentication]]
* [[Atlas-Authorization-Model][Atlas Authorization Model]]
* [[Configure-simple-authorizer][Steps to configure Atlas Simple Authorizer]]
* [[ClassificationPropagation][Classification Propagation]]
* [[Configuration][Configuration]]
* [[Notifications][Notifications]]
* Hooks & Bridges
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment