Unraveling the Mystery: Getting Airflow Logs in JSON Format
Image by Lombardi - hkhazo.biz.id

Unraveling the Mystery: Getting Airflow Logs in JSON Format

Posted on

Are you tired of sifting through verbose log files in Airflow, only to find yourself lost in a sea of irrelevant information? Do you dream of a world where log data is presented in a neat, easy-to-digest JSON format? Well, buckle up, friend, because today we’re going to explore the answer to the burning question: “Is there any out-of-the-box functionality to get the Airflow logs in JSON?”

The Importance of JSON Logs

Before we dive into the solution, let’s take a step back and discuss the significance of JSON logs. JSON (JavaScript Object Notation) has become the de facto standard for data exchange and logging due to its simplicity, readability, and flexibility. With JSON logs, you can :

  • Easily parse and process log data using popular tools like jq, AWS Lambda, or Google Cloud Functions.
  • Visualize log data using powerful visualization tools like Kibana, Grafana, or Tableau.
  • Integrate log data with other systems and services, such as monitoring tools, alerting systems, or machine learning models.
  • Scale your logging infrastructure with ease, thanks to JSON’s lightweight and compressed nature.

The Out-of-the-Box Solution: Airflow’s Built-in JSON Logging

Drumroll, please… Yes, Airflow does provide an out-of-the-box solution for getting logs in JSON format! You can enable JSON logging by setting the following configuration in your `airflow.cfg` file:

 
[logging]
json_formatter = airflow.utils.log.json_formatter

This will instruct Airflow to use the `json_formatter` function from the `airflow.utils.log` module, which will format log messages in JSON. However, this approach has some limitations:

  • Logs will only be written in JSON format if the `json_formatter` is specified.
  • Only log messages generated by Airflow’s own logging mechanism will be affected, not logs from external dependencies or plugins.
  • Some third-party libraries might not support JSON logging, which can lead to inconsistencies in log formatting.

The Custom Solution: Creating a JSON Logger

If the built-in solution doesn’t quite meet your needs, fear not! We can create a custom JSON logger to get the desired output. We’ll use Python’s built-in `logging` module and the `json` module to create a custom logger that formats log messages in JSON.

import logging
import json

class JSONLogger(logging.Logger):
    def __init__(self, name, level=logging.INFO):
        super().__init__(name, level)

    def _log(self, level, msg, args, exc_info=None, extra=None):
        log_dict = {
            'asctime': self.formatTime(record, self.default_time_format),
            'name': self.name,
            'levelname': level,
            'message': msg,
            'args': args,
            'exc_info': exc_info,
            'extra': extra
        }
        log_json = json.dumps(log_dict)
        super()._log(level, log_json, args, exc_info, extra)

logger = JSONLogger('airflow_json_logger')
logger.setLevel(logging.INFO)

handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(handler)

In this example, we define a custom `JSONLogger` class that inherits from Python’s built-in `logging.Logger` class. We override the `_log` method to create a JSON-formatted log dictionary, which is then serialized using the `json.dumps` function. Finally, we create an instance of the `JSONLogger` class and configure it to log at the INFO level.

Configuring Airflow to Use the Custom JSON Logger

To use our custom JSON logger in Airflow, we need to update the `airflow.cfg` file:

[logging]
logging_class = path.to.JSONLogger
logging_config = {
    'version': 1,
    'formatters': {
        'json_formatter': {
            'format': '%(message)s'
        }
    },
    'handlers': {
        'console_handler': {
            'class': 'logging.StreamHandler',
            'formatter': 'json_formatter'
        }
    },
    'loggers': {
        'airflow_json_logger': {
            'level': 'INFO',
            'handlers': ['console_handler']
        }
    }
}

In this configuration, we specify the custom `JSONLogger` class and define a logging configuration that uses the `json_formatter` and `console_handler` to output JSON-formatted logs to the console.

Putting it all Together

Now that we have our custom JSON logger set up, let’s demonstrate how to use it in an Airflow DAG:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2023, 3, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'json_logger_dag',
    default_args=default_args,
    schedule_interval=timedelta(days=1)
)

def log_json_example():
    logger = logging.getLogger('airflow_json_logger')
    logger.info('This is a JSON-formatted log message')

log_json_task = PythonOperator(
    task_id='log_json_task',
    python_callable=log_json_example,
    dag=dag
)

In this example, we define a DAG that uses our custom JSON logger to log a message in JSON format. When you run this DAG, you should see the log message in JSON format in the Airflow console logs.

Conclusion

In conclusion, while Airflow does provide an out-of-the-box solution for getting logs in JSON format, it has its limitations. By creating a custom JSON logger, we can overcome these limitations and get the flexibility we need to work with log data in a format that’s easy to parse, process, and visualize. Remember to update your `airflow.cfg` file and configure your DAGs to use the custom logger, and you’ll be enjoying JSON-formatted logs in no time!

Method Advantages Disadvantages
Out-of-the-Box JSON Logging Easy to enable, built-in support Limited flexibility, only works for Airflow’s own logging mechanism
Custom JSON Logger Highly flexible, can be customized to meet specific needs Requires additional configuration and coding

So, the next time someone asks you, “Is there any out-of-the-box functionality to get the Airflow logs in JSON?”, you can proudly say, “Yes, and even better, we can create a custom JSON logger to get exactly what we need!”

Frequently Asked Question

Get ready to dive into the world of Airflow logs in JSON!

Is there a way to get Airflow logs in JSON format directly from the API?

Yes, you can get Airflow logs in JSON format using the Airflow REST API. You can make a GET request to the /logs/ endpoint and pass the task_instance_key as a parameter. The response will be in JSON format.

Can I use Airflow’s built-in logging mechanism to get logs in JSON format?

Unfortunately, Airflow’s built-in logging mechanism does not support JSON output out of the box. However, you can write a custom logging handler to parse the logs and output them in JSON format.

Are there any third-party libraries that can help me get Airflow logs in JSON format?

Yes, there are third-party libraries available that can help you get Airflow logs in JSON format. For example, you can use the airflow-json-logs library, which provides a JSON logging handler for Airflow.

Can I use Airflow’s task_instance.log_reading function to get logs in JSON format?

No, the task_instance.log_reading function does not provide an option to get logs in JSON format. It returns a string containing the log output.

Is it possible to configure Airflow to store logs in JSON format by default?

Unfortunately, Airflow does not provide a built-in option to store logs in JSON format by default. However, you can write a custom logging handler to achieve this.