In July, Apple released its new operating system, Mac OS X Lion, and pulled a Frank Lloyd Wright.The architect would return to the homes he had designed and rearrange the furniture as he saw fit.
Reverse engineering is the direct opposite of building or engineering an application: you break things down bit by bit to see how they actually work. Developers incorporate reverse engineering techniques to solve tasks from investigating bugs in code to ensuring smooth and easy legacy code maintenance.
When reverse engineering software, the operating system it was created for should be one of the first things you pay attention to. In this article, we describe how to decompile macOS software and iOS apps. This tutorial will be useful for developers who want to know more about macOS software and iOS apps reverse engineering.
Contents:
Why do we need reverse engineering? The answer is rather simple.
When you build a piece of software, you usually have all of the source code available and can take a look at the source code at any time. So figuring out how a particular process or feature works shouldn’t be too much of a challenge.
But what if you have an executable and you need to figure out how it works without access to any source code? The solution is obvious: you need to reverse engineer it.
There are several reasons why you might need to use reverse engineering:
Below, we take a closer look at the basic structure of an executable, briefly cover reversing Objective-C and Swift code, list several of the most popular tools for reverse engineering macOS and iOS apps, and give some reverse engineering tips for a number of use cases.
Let’s start with some basics that you need to know before you try to reverse engineer your first executable.
Related services
Professional Reverse Engineering ServicesIf you’ve finally decided to reverse engineer binary, then you should understand that some parts of it probably contain executable code. Therefore, before you even start reversing a piece of software, you need to learn the executable binary structure.
In the world of Mach kernel-based operating systems, it’s common to use the Mach-O executable format. These executables can be inside thin or fat binary files. Here’s how these two types of binaries differ:
We use fat binaries to merge executable code in one single file for different CPU instruction sets.
Here’s the basic structure of a Mach-O executable:
Let’s take a closer look at each component.
Every binary begins with a header. This is a key part of every executable for macOS and iOS. It’s the first part of the executable read by the loader during image loading.
A fat binary begins with a fat header, while a thin binary begins with a mach header. Every header starts with a magicnumber used to identify it.
A fat header describes the locations of mach headers for executables in a binary. A mach header describes general information about the current executable file.
A mach header contains load commands that represent several things crucial for image loading:
Segments are typically large pieces of an executable file mapped by a loader to some location in the virtual address space.
In the image above, you can see a lot of information about the chosen segment:
All segments consist of sections. A section is part of the segment that’s intended to store some specific type of content. For example, the __text section of the __TEXT segment contains executable code, and the __la_symbol_ptr section of the DATA segment contains a table of pointers to so-called lazy external symbols.
Every dynamic library dependency is described by a load command containing the path to the dynamic library binary file and its version.
In addition, load commands contain the following information critical for the operation of executable code:
The main symbol table contains all symbols used in the current executable. Every locally or externally defined symbol or even stub (which can be generated for an external call that executes through an import table) is mentioned here. This table is divided into three parts, showing whether the symbol is debug, local, or external. Every entry in the main symbol table represents a particular part of the executable code by specifying the offset of its name in the string table, type, section ordinal, and other type-specific information.
There’s a string table that contains names of symbols defined in the main symbol table. There’s also a dynamic symbol table that links import table entries to the appropriate symbol. In addition, there’s one more table that contains information used by the dynamic loader for every external symbol.
Read also:
How to Reverse Engineer (Windows) Software the Right Way
A code signature can also be rather helpful when reverse engineering a binary. While a code signature is one of the poorly documented (but still open-source) parts of an executable, its content can be displayed by means of the codesign tool (see the image below).
Code signature data contains a number of important elements:
Let’s take a closer look at each element.
The code directory is a structure that contains miscellaneous information (hash algorithm, table size, size of code pages, etc.) and a table of hashes. The table itself consists of two parts: positive and negative.
The positive part of the table of hashes contains hashes of executable code pages.
The negative part optionally contains hashes of such code signature parts as code signing requirements, resources, and entitlements, as well as a hash of the Info.plist file.
Code signing requirements, resources, and entitlements are just bytestreams of the appropriate files located inside a bundle.
The code signature is an encrypted code directory represented in CMS format.
One more thing you should pay special attention to before you learn how to reverse engineer a macOS or iOS app is the architecture it was designed for. Modern desktop devices usually use x86-64 CPUs. Mobile devices use ARMv7, ARMv7s, ARMv8-A, ARMv8.2-A, ARMv8.3-A, and ARM64 CPUs.
Knowledge of instruction sets is important when reverse engineering algorithms. In addition, it’s good to be familiar with calling conventions and some things specific to ARM-based systems on a chip (SoC), like thumb mode and opcodes format.
Nowadays, all system frameworks and dynamic libraries are merged into a single file called the shared cache. This file is located at the following address: /System/Library/Caches/com.apple.dyld/.
These are the basic things you need to know about before doing any reverse engineering. Now let’s talk about the macOS and iOS reverse engineering tools that can help you on this journey.
Read also:
Restoring Classes – Useful Tips for Software Reverse Engineers
Below are standard command-line tools for reverse engineering iOS and macOS apps. These tools are available out of the box on Mac:
In addition, there are several third-party reverse engineering utilities:
Let’s look closer at each of these utilities.
IDA (Interactive DisAssembler) is one of the most famous and widely used reverse engineering tools. IDA is a disassembler and debugger that’s suitable for performing complex research of executables. It’s a cross-platform tool that runs on macOS, Windows, and Linux.
IDA can be used for disassembling software designed for macOS, Windows, and Linux platforms. The program has a free evaluation version with limited functionality. There’s also a paid version, IDA Pro, which supports a wider range of processors and plugins.
MachOView is a utility that works similarly to the otool and nm console tools. The key difference is that MachOView does have a GUI, so you can browse the structure of mach-o files in a more comfortable way. In fact, MachOView was used to make most of the screenshots you see in this article. MachOView is free to use, but unfortunately, it isn’t always stable.
Class-dump is a free command-line utility for analyzing the Objective-C segment of mach-o files. With class-dump, you can get pretty much the same information as from otool but in the form of standard Objective-C declarations. In particular, class-dump creates declarations for classes, categories, and protocols.
Hopper is an interactive tool for disassembling, decompiling, and debugging software and applications. Similarly to IDA, Hopper has a free version with a limited set of features in addition to a paid version. Hopper was designed for Linux and macOS and works best for retrieving Objective-C specific information from the analyzed binary.
Dsc_extractor is Apple’s own open-source tool for extracting libraries and frameworks from dyld_shared_cache. When extracting data, the utility saves the locations and original names of all extracted objects.
Ghidra is an open-source reverse engineering framework provided by the NSA. It supports macOS, Windows, and Linux. Ghidra can be used as a decompiler, as well as a tool for performing such tasks as assembling/disassembling, graphing, and scripting code. It can be customized with the help of scripts and plugins written in Java or Python.
Read also:
9 Best Reverse Engineering Tools for 2019
Now, let’s look at some of the specifics of reverse engineering code written in particular programming languages. Within this article, we focus on the peculiarities of reverse engineering solutions written in Objective-C and Swift.
Objective-C is commonly used for developing applications for macOS and iOS. It relies on a specific C runtime, which somewhat simplifies the process of reverse engineering.
Let’s consider a simple code from an actual application: