-
Notifications
You must be signed in to change notification settings - Fork 8
Megatron FAQ
Q: How do I create a new configuration (job type)?
A: Copy an existing configuration that are similar. Look in the job-type directory. The following properties must probably be modified:
- "parser.lineRegExp": Regular expression with variables that defines how to parse each line in the input file
- "parser.timestampFormat" and "parser.item.logTimestamp": Defines how to handle a timestamp
- "filter.*": Which filters are needed?
- "mail.*": Mail templates
Tips:
- A job type (configuration) is the same as the filename minus ".properties". Thus, "conf/job-type/ip-flowing.properties" have the job type "ip-flowing" which may be specified at the command line.
- If some property does not make sense, check the comment in megatron-globals.properties. Search for the property in all "conf/job-type/*.properties" files to see how it is used.
Q: How do I debug a new configuration (job type)?
A: The most common problem is parsing errors. Search for the following text in the log file: "Expanded reg-exp (parser.lineRegExp)". This logline contains the expression from "parser.lineRegExp" with its variables expanded (each variable is a regular expression). Test this expanded regular expression with a couple of lines in the input file. Each group should match a variable. An interactive regular expression tester is very handy in this process, and a good one is the Eclipse plugin QuickREx. The following steps will install QuickREx in Eclipse:
- Help - Install New Software...
- Paste the following URL in "Work with:" input field and then press "Add...":
http://www.bastian-bergerhoff.com/eclipse/features
- Accept license plus other stuff
- Window - Show View - Other... + Select "QuickREx" in tree
Other popular regular expression tools are the following:
- The Regulator - Windows
- Kodos - Cross-platform (Python)
- Kiki - Ubuntu
- gskinner.com/RegExr - Online Flash application
Q: How do I turn off debug logging?
A: Edit the file "log4j.properties":
-
Replace "DEBUG" with "INFO" for the property "log4j.rootLogger" and "log4j.logger.se.sitic".
-
Remove "CONSOLE" for the property "log4j.appender.se.sitic" to turn off logging to stdout.
In production the following config changes are recommended:
log4j.rootLogger=INFO, FILE
log4j.appender.se.sitic=FILE
log4j.logger.se.sitic=INFO
Q: Running a file with several thousands lines takes too long time. How can I get better performance?
A: DNS queries are the only lookups that are done "over the wire" and not local. Thus, fewer DNS queries can result in significant better performance, which can be achieved in the following ways:
-
Use MultithreadedDnsProcessor: This will increase performance by a hundred times. DNS requests are done in multiple threads. Examples of configuration that uses MultithreadedDnsProcessor: ip-flowing.properties and megatron-whois-hostname.properties.
-
Fine tune properties: The following properties affects DNS performance:
- dnsJava.useDnsJava
- dnsJava.useSimpleResolver
- dnsJava.dnsServers
- dnsJava.timeOut
- fileProcessor.multithreadedDnsProcessor.noOfThreads
-
Filter before decoration: If possibly, do the filtering before the decoration step in which information will be added to the record (e.g. add a DNS name to an IP address). For example, in the configuration stopforumspam all non-Swedish IP-addresses are filtered out before decoration. This is done by using the property "filter.preDecorator.classNames.*". One drawback with this method is that IP addresses that is hosted in a country other than Sweden but have a ".se" DNS name will be missed. Use the property "filter.preStorage.classNames.*" to filter after decoration (see e.g. surfcert-ids).
-
Turn off DNS queries: Modifiy the property "decorator.classNames" so it does not include "HostnameDecorator" (reverse DNS lookup) or "IpAddressDecorator" (forward DNS lookup). Example: the job type ip-flowing-fast skips DNS queries while ip-flowing do DNS queries.
-
Diff input files: If an input source is run regularly, e.g. once a day, the file can be diffed and only the new lines are processed. Use the "DiffProcessor", for example:
fileProcessor.classNames.0=se.sitic.megatron.fileprocessor.DiffProcessor
Q: Many references to Sitic exists. What is Sitic?
A: Sitic was renamed to CERT-SE in 2011. We are too lazy to change all Sitic-references.