# 0014: Extend the PE field set - Stage: **X (abandoned)** - Date: **2021-11-18** The Portable Executable (PE) sub-field, of the `file` top-level fieldset, can be updated to include more file attributes to aid in file analysis. This additional document metadata can be used for malware research, as well as coding and other application development efforts. ## Stage X This RFC is not being worked on actively, and it has been marked as abandoned. If an individual wishes to advance it in the future, open a new pull request against this proposal. ## Fields This RFC is to create 25 additional sub-fields within the `file.pe` fieldset. | Name | Type | Description | | ---- | ---- | ----------- | | pe.authentihash | keyword | Authentihash of the PE file. | | pe.compile_timestamp | date | Compile timestamp of the PE file. | | pe.compiler | nested | Compiler information. | | pe.compiler.version | keyword | Version of the compiler. | | pe.compiler.name | keyword | Name of the compiler. | | pe.creation_date | date | Extracted when possible from the file's metadata. Indicates when it was built or compiled. It can also be faked by malware creators. | | pe.entry_point | keyword | Relative byte offset to the base of the PE file. | | pe.exports | keyword | List of symbols exported by PE | | pe.debug | nested | Debug information, if present | | pe.debug.offset | keyword | Debug offset information. | | pe.debug.size | keyword | Size of the debug information. | | pe.debug.type | keyword | Information type generated by the debug options. | | pe.debug.timestamp | date | Timestamp of the debug information. | | pe.imports | flattened | List of all imported functions | | pe.sections | nested | Data about sections of compiled binary PE | | pe.sections.chi2 | long | Chi-square probability distribution. | | pe.sections.virtual_address | long | Virtual address available to the file. | | pe.sections.entropy | float | Measurement of entropy randomness in the file. | | pe.sections.flags | keyword | Section flags of the file. | | pe.sections.name | keyword | Section names of the file. | | pe.sections.raw_size | long | Size of the section or the size of the initialized data on disk. | | pe.resources | nested | If the PE contains resources, some info about them | | pe.resources.chi2 | long | Chi-square probability distribution | | pe.resources.filetype | keyword | File type of the resources section | | pe.resources.entropy | long | Measurement of entropy randomness in the resources section. | | pe.resources.sha256 | keyword | SHA256 hash of resources section | | pe.resources.language | keyword | Language identification | | pe.resources.type | keyword | List of resource types. | | pe.machine_type | keyword | Machine type of the PE file. | | pe.packers | keyword | List of packers and tools used. | | pe.rich_header.hash.md5 | keyword | Hash of the PE header. | | pe.icon | nested | Information of embedded program icon. | | pe.icon.hash | nested | Hash information for the embedded program icon. | | pe.icon.hash.dhash | keyword | Difference Hash (dhash) to find files with a visually similar icon or thumbnail. | [New `pe.yml` fields](pe/pe.yml) ## Usage In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures. As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, packers, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family. As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor. ## Source data This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms. * [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) * [VirusTotal API](https://developers.virustotal.com/v3.0/reference) * [Emerson FSF](https://github.com/EmersonElectricCo/fsf) * [Target Strelka](https://github.com/target/strelka) * [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) ## Scope of impact There should be no breaking changes, depreciation strategies, or significant refactoring as this is extending the existing fieldset. While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields. ## Concerns ## People The following are the people that consulted on the contents of this RFC. * @peasead | author * @devonakerr | sponsor * @dcode, @peasead | subject matter expert ## References * [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) * [VirusTotal API](https://developers.virustotal.com/v3.0/reference) * [Emerson FSF](https://github.com/EmersonElectricCo/fsf) * [Target Strelka](https://github.com/target/strelka) * [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) ### RFC Pull Requests * Stage 1: https://github.com/elastic/ecs/pull/1071 * Stage X: https://github.com/elastic/ecs/pull/1670