Bridge-Falcon.twiki 2.44 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
---+ Falcon Atlas Bridge

---++ Falcon Model
The default falcon modelling is available in org.apache.atlas.falcon.model.FalconDataModelGenerator. It defines the following types:
<verbatim>
falcon_process(ClassType) - super types [Process] - attributes [timestamp, owned-by, tags]
</verbatim>

One falcon_process entity is created for every cluster that the falcon process is defined for.

The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. The unique attributes are:
   * falcon_process - attribute name - <process name>@<cluster name>

---++ Falcon Hook
Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model defined in org.apache.atlas.falcon.model.FalconDataModelGenerator.
The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities.
   * Add 'org.apache.falcon.atlas.service.AtlasService' to application.services in <falcon-conf>/startup.properties
   * Link falcon hook jars in falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
19 20 21 22
   * In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
     <verbatim>
     export FALCON_SERVER_OPTS="$FALCON_SERVER_OPTS -Datlas.conf=<atlas-conf>"
     </verbatim>
23

24
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
25 26 27 28 29 30 31 32 33 34 35 36 37
   * atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false
   * atlas.hook.falcon.numRetries - number of retries for notification failure. default 3
   * atlas.hook.falcon.minThreads - core number of threads. default 5
   * atlas.hook.falcon.maxThreads - maximum number of threads. default 5
   * atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
   * atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000

Refer [[Configuration][Configuration]] for notification related configurations


---++ Limitations
   * Only the process entity creation is currently handled. This model will be expanded to include all Falcon metadata
   * In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity