Querying AWS CloudFront and WAF Logs using AWS Athena — Part II
So, in the previous section, it was discussed up to the point where a Database is created in Athena. We will continue our discussion further on creating Athena tables for CloudFront and WAF, and simple ways of querying the created tables.
As the next step, we can create a table for CloudFront Logs. We will be using the query provided in AWS documentation directly to create this table. The documentation URL has been added under the Reference link [1] for further reading.
Here, you can change the Database and Table name according to the Database created earlier and the naming convention in place for your projects. As an example default.cloudfront_logs can be changed to cf_logs.newproject_cloudfront_logs
Apart from that, it is required to update the S3 bucket location where the CloudFront logs are stored at the location section,
LOCATION ‘s3://CloudFront_bucket_name/CloudFront/’
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
request_ip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
user_agent STRING,
query_string STRING,
cookie STRING,
result_type STRING,
request_id STRING,
host_header STRING,
request_protocol STRING,
request_bytes BIGINT,
time_taken FLOAT,
xforwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
response_result_type STRING,
http_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/CloudFront/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
The image below shows a sample table created using the above query in the default Database,
Let’s move on and create a table for WAF logs. Here also, we will be using the query provided in the AWS documentation with the changes to required attributes. The documentation URL has been added under the Reference link [2] for further reading.
Here, you can change the Database and Table name according to the Database created earlier and the naming convention in place for your projects. As an example, waf_logs can be changed to newproject_waf_logs
Apart from that, it is required to update the S3 bucket location where the WAF logs are stored at the location section,
LOCATION ‘s3://waflog_bucket_name/waflogs/’
CREATE EXTERNAL TABLE `waf_logs`(
`timestamp` bigint,
`formatversion` int,
`webaclid` string,
`terminatingruleid` string,
`terminatingruletype` string,
`action` string,
`terminatingrulematchdetails` array<
struct<
conditiontype:string,
location:string,
matcheddata:array<string>
>
>,
`httpsourcename` string,
`httpsourceid` string,
`rulegrouplist` array<
struct<
rulegroupid:string,
terminatingrule:struct<
ruleid:string,
action:string,
rulematchdetails:string
>,
nonterminatingmatchingrules:array<
struct<
ruleid:string,
action:string,
rulematchdetails:array<
struct<
conditiontype:string,
location:string,
matcheddata:array<string>
>
>
>
>,
excludedrules:array<
struct<
ruleid:string,
exclusiontype:string
>
>
>
>,
`ratebasedrulelist` array<
struct<
ratebasedruleid:string,
limitkey:string,
maxrateallowed:int
>
>,
`nonterminatingmatchingrules` array<
struct<
ruleid:string,
action:string
>
>,
`requestheadersinserted` string,
`responsecodesent` string,
`httprequest` struct<
clientip:string,
country:string,
headers:array<
struct<
name:string,
value:string
>
>,
uri:string,
args:string,
httpversion:string,
httpmethod:string,
requestid:string
>,
`labels` array<
struct<
name:string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='action,formatVersion,httpRequest,httpSourceId,httpSourceName,labels,nonTerminatingMatchingRules,rateBasedRuleList,requestHeadersInserted,responseCodeSent,ruleGroupList,terminatingRuleId,terminatingRuleMatchDetails,terminatingRuleType,timestamp,webaclId')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://athenawaflogs/WebACL/'
The image below shows a sample table created using the above query in the default Database,
Now, the Athena tables have been created for CloudFront and WAF, let’s move into the querying part. In the first section, we have created an S3 bucket for saving the query results along with the Workspace creation. If you did not create the S3 bucket, it is required to create the bucket and configure it accordingly before running the queries.
Running a simple query against the CloudFront and WAF tables just created would simply look like below,
SELECT *
FROM "default"."newproject_cloudfront_logs"
LIMIT 5SELECT *
FROM "default"."newproject_waf_logs"
LIMIT 5
Let’s look into a more practical use case where you have a CloudFront distribution and you are getting the following error screen. You need to check the WAF logs using the Athena tables created for further investigation,
Here, you can see that there’s a string named Request ID, and this is a unique one that can be used in querying the WAF logs for a specific record. The following query can be used to retrieve the specific record caused the above error,
SELECT *
FROM "default"."newproject_waf_logs"
WHERE httprequest.requestid = 'AF4z0A4j7WQSBxYOBERq1RiJalZKCenyR197pVMd5vD7aD-tnn9hxQ=='
There are many more ways to customize and query CloudFront and WAF logs through Athena. The above queries are just the simplest of queries shown for reference purposes. You can find more information from AWS documentation as well as other resources available. Also, it is not necessary to follow these steps in the same order as there can be different methods with additional enhancements you can always try out once you are familiar with Athena Databases, tables, and querying.
References
[1] https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html
[2] https://docs.aws.amazon.com/athena/latest/ug/waf-logs.html
[3] https://docs.aws.amazon.com/athena/latest/ug/waf-logs.html#query-examples-waf-logs
[4] https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html#query-examples-cloudfront-logs