Bridge-Falcon.twiki 2.9 KB
Newer Older
1 2 3 4 5
---+ Falcon Atlas Bridge

---++ Falcon Model
The default falcon modelling is available in org.apache.atlas.falcon.model.FalconDataModelGenerator. It defines the following types:
<verbatim>
6 7 8 9 10
falcon_cluster(ClassType) - super types [Infrastructure] - attributes [timestamp, colo, owner, tags]
falcon_feed(ClassType) - super types [DataSet] - attributes [timestamp, stored-in, owner, groups, tags]
falcon_feed_creation(ClassType) - super types [Process] - attributes [timestamp, stored-in, owner]
falcon_feed_replication(ClassType) - super types [Process] - attributes [timestamp, owner]
falcon_process(ClassType) - super types [Process] - attributes [timestamp, runs-on, owner, tags, pipelines, workflow-properties]
11 12 13 14
</verbatim>

One falcon_process entity is created for every cluster that the falcon process is defined for.

15 16 17 18 19 20
The entities are created and de-duped using unique qualifiedName attribute. They provide namespace and can be used for querying/lineage as well. The unique attributes are:
   * falcon_process - <process name>@<cluster name>
   * falcon_cluster - <cluster name>
   * falcon_feed - <feed name>@<cluster name>
   * falcon_feed_creation - <feed name>
   * falcon_feed_replication - <feed name>
21 22 23 24

---++ Falcon Hook
Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model defined in org.apache.atlas.falcon.model.FalconDataModelGenerator.
The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities.
25
   * Add 'org.apache.atlas.falcon.service.AtlasService' to application.services in <falcon-conf>/startup.properties
26
   * Link falcon hook jars in falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
27 28
   * In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
     <verbatim>
29
     export FALCON_SERVER_OPTS="<atlas_home>/hook/falcon/*:$FALCON_SERVER_OPTS"
30
     </verbatim>
31

32
The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:
33 34 35 36 37 38 39 40 41 42 43 44
   * atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false
   * atlas.hook.falcon.numRetries - number of retries for notification failure. default 3
   * atlas.hook.falcon.minThreads - core number of threads. default 5
   * atlas.hook.falcon.maxThreads - maximum number of threads. default 5
   * atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
   * atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000

Refer [[Configuration][Configuration]] for notification related configurations


---++ Limitations
   * In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity