Steps towards debugging and resolving Android bootloops

Many users have good reasons to flash custom ROMs and Android distributions like LineageOS onto their phones: If you make the right choice, using these alternative operating system releases on your mobile device will make your digital life more private, potentially more secure, and - at least for a certain kind of folk - also more fun.

I’ve been using LineageOS (which had been known as CyanogenMod back when I first started using it) since 2013 on all my “daily driver” Android mobiles, because I have come to appreciate its sane choice of defaults and default applications, its genuinely useful added features, and also its engineering quality in general.

If you replace something that is seemingly endlessly validated by corporate development and QA processes (however ineffective they might be ;)) with an alternative that’s being developed by a volunteer community, sometimes, you will find yourself in trouble because of that choice. One manifestation of such trouble can be a device that gets knocked out by something commonly referred to as a bootloop.

I am by no means an Android expert, but I am quite apt with wrangling GNU/Linux, even without all the GNU. Recently, a failed LineageOS upgrade from release 18.1 to 19.1 resulted in my phone caught up in a bootloop. This prompted me to do a deeper than usual dive into the Android ecosystem, and I chose to document some of that journey to hopefully help others who find themselves in a similarly dire situation.

What’s a bootloop, and why should I (not) want one?

In my eight years of using LOS/CM, I’ve been confronted with the unpleasant situation of having my device bootloop exactly twice: After powering up, the phone would continuously display its startup animation, without ever making it to a stage which would allow for any touchscreen interaction, no matter how long you wait, and until it runs out of power. This is usually very frustrating, since it prevents you from using your phone, and also from getting any data off or out of it.

How do bootloops come to be?

In my case, these bootloop experiences were the direct result of a CyanogenMod/LineageOS upgrade having gone slightly wrong. That’s not a common thing with LineageOS, but it is also not the only potential cause (there are reports on the web of users who’ve had actual hardware failures cause such effects) for a bootloop. Another relatively common cause is mismatching software components that come from different sources, but are necessary to perfectly match and complement each other: the “vendor”-portion of your firmware, with very device- and driver-specific content, and the rest of the operating system (the “ROM”, i.e., a LineageOS image in ZIP format), which contains most or all the software you interact with on a daily basis. If that is the case for your phone, that should be documented by the supplier of your Android distribution in the respective installation guide.

My first instance of experiencing a bootloop mishap caught me by surprise many years ago, back in the CyanogenMod period of custom Android distributions. I do not recall the specific details, but the root cause of my phone being unable to interact with me was some uncaught datatype (lack-of-)conversion exception in an application that was always being started by default, which led to that app immediately crashing in an endless loop. I faintly recall deleting a single row in an sqlite database, containing application settings and options (including that one pesky datum that made the app crash), on my device to fix it back then.

This time around, mere days ago, a similar problem caused the SystemUI component to crash, and as a consequence of the effective support I received from the LineageOS user and developer community on their IRC channel, a fix could be devised and implemented, which will hopefully keep other people from getting caught in this particular bootloop trap in the future.

YOLO - just do a factory reset!

Both times when the dreaded bootloop struck - that ancient, but also the very recent one that triggered me to draft this document - I managed to fix the underlying problem without resorting to the “nuclear option”, which is the factory reset that some users cling to as their first line of defense against any such bootlooping ilk. A factory reset wipes all content - all options, all applications, all their associated data and configuration, all configured accounts, etc. - from your device and makes it “as good as new”, at least from a data- and software-only perspective. Consequently, if there’s any data stored on your phone you do not have another copy of somewhere else, that data is permanently lost in the process. For some users who would rather not put all their data into any cloud service provider’s hands, that can be a rather significant step to take.

Also, depending on the root cause for the particular bootloop you are confronted with, a factory reset is not guaranteed to restore your device to working condition: If there’s a fundamental mismatch in vendor firmware and LineageOS image like described above, any amount of factory resets won’t change a thing, and whatever problem you are dealing with will patiently persist.

Determining the bootloop root cause

So in order to deal with an Android device caught in a bootloop, you’re better off trying another approach first: You will want to gather data about the specific circumstances leading to your phone not booting properly any more. With this data collected, you construct - probably with the help and support of others - a hypothesis of what needs to change to fix the observed problem, and then try to apply that supposed fix. If that proves unsuccessful, and you can still muster more patience and ingenuity (and can identify other problems to combat, which might be the actual cause of the bootloop), you repeat this process. Eventually, you’ll either be successful, or out of options. Only then, a factory reset might be in order, or even required.

Enter the Android Debug Bridge

Diagnosing a bootloop requires you to interact with your Android device using uncommon means of access over a protocol and using a tool called adb, the Android Debug Bridge. I will not be going into detail on how to get started with this particular program, but will assume you know how to acquire it, and how to use it to connect to a device which has its adb-compatible access enabled. What I am going to show is how to use a recovery image (such as the LineageOS recovery images that the project published for each supported device) to enable unauthenticated adb access during bootup, and how to use that to (hopefully) determine the root cause of whatever keeps your phone or tablet from reaching its normal, user-interactive state.

Android Recovery

Your first step on the road towards fixing your device is to boot into the ADB-enabled recovery image installed on it. How this is done varies from model to model, but commonly observed methods involve keeping the Volume Up or Volume Down button(s) pressed while the device powers up. You will need to check your specific model’s installation instructions to know for sure.

Once recovery is booted up, you will need to take care of two things:

  1. ADB needs to be active. In LineageOS recovery, you will find that option in the Advanced menu.
  2. Also, system needs to be mounted. Again, assuming LineageOS recovery, look under Advanced.

Once these preconditions are met, use adb root and adb shell to connect to the device while it’s attached to your PC. If that does not yield a shell prompt in your booted recovery environment on your phone, you will have to get that sorted before having a shot at doing (and fixing) anything. Sorry.

F…ix the system

Now, we will mess with a number of essential device settings that are stored in a file now made available at /mnt/system/system/build.prop. Usually, the filesystem this file resides on is mounted read-only in your recovery environment. Wanting to edit it requires us to change that. To do so, we need to transparently remount this filesystem in read-write mode by running the following command:

mount -o remount,rw /mnt/system
cp -a /mnt/system/system/build.prop /mnt/system/system/build.prop.bak

If no errors are reported after executing these two commands, you just successfully enabled write access to the system partition/filesystem, and also created a backup of the file we’re going to mess with for enabling adb access when booting the actual ROM installed on your device.

If the second command errors out, that might mean that the first command also failed, or that your build.prop is located at another path in the file system. You could try locating its actual path by running find / -name build.prop 2>/dev/null, and re-try the above with adjusted command lines.

Modifying build.prop

Since the recovery environment we are presently working in is rather limited, the only editor suitable for performing the work on the text-based configuration file we will be dealing with is sed, the streaming editor. So we’ll run sed with a couple of arguments that will cause it to non-interactively rewrite a few lines in build.prop, and we’re done, like so:

sed -i -e 's/^persist.sys.usb.config=.*/persist.sys.usb.config=mtp,adb/' -e 's/ro.adb.secure=.*/ro.adb.secure=0/' -e 's/^ro.secure=.*/ro.secure=0/' /mnt/system/system/build.prop
grep -E '^persist.sys.usb.config=|^ro.adb.secure=|^ro.secure=' /mnt/system/system/build.prop

If no errors are reported, and you get three lines that correspond to the values we fed into the sed command line, you’re done here - recovery’s job is finished for today. As soon as these changes get persisted to system, Android will, while booting up, allow for unrestricted adb access to your device.

Log off of your adb shell, and use the recovery UI to trigger a normal reboot (which will lead you into the next episode of your bootloop adventure - but far better equipped this time around).

Using ADB to access the bootlooping system

Once your phone is up boasting its loading screen again, running adb logcat should connect you to your regular, non-recovery Android installation, give you access to its debug- and error-logging facilities. Be warned, you will get massive amounts of text shoveled towards you this way, since Android, like every other complex software system built by mankind, messed up terribly and fundamentally with their logging. Still, it’s infintely better to have this, than to have no information at all, so we’re going to deal with it.

Dealing with all that logspam and collecting data

To cope with the copious log output, I recommend to have at least a few 100s of MBytes available to store adb logcat data on, for anlysis and comparison with potential future attempts of getting things in order. Use output redirection via > on your shell, or piping through tee to achieve that. A nifty chain of commands I came up with that could help cope with essentially duplicated yet slightly varying messages is an invocation like the following:

adb logcat | sed -r 's/[0-9]+/_/g' | awk '!a[$0]++' | tee -a "/tmp/logcat_$(date +%s)_dedup"

This will mangle all numeric parts of the output, and make sure each occurring line gets printed only once.

Sometimes, an approch like this will not be helpful to capture the root cause, and you would rather want to capture a few thousand lines of logcat output unaltered to analzye in bulk. For that, you could use an invocation like

adb logcat | head -n 5000 | tee "/tmp/logcat_$(date +%s)_head"

to limit the amount of data collected to a sample of exactly 5000 lines. With sufficient activity (which, in typical bootloop scenarios, is quite normal since some services will likely be crashing and restarting over and over again), this should conclude within a few seconds to minutes.

Determining the root cause

With that sorted, the really difficult part starts - determining what could be at the root of the chain of events that gets your system into the dire state it’s in.

Truth be told, there’s no surefire way to tell what that is and how to solve it, and only experience and familiarity with Android and development on and for the platform will bode well for success.

Usually, you will be looking for something - a process or system service - that fails repeatedly, in some sort of throttled loop. In my case, during my recent bootloop tragedy, I noticed lots of AndroidRuntime exceptions that read like the following:

05-14 18:23:27.654  2037  2037 E AndroidRuntime: FATAL EXCEPTION: main
05-14 18:23:27.654  2037  2037 E AndroidRuntime: Process: com.android.systemui, PID: 2037
05-14 18:23:27.654  2037  2037 E AndroidRuntime: java.lang.IllegalArgumentException: Unable to retrieve overlay information for org.lineageos.overlay.customization.navbar.nohint
05-14 18:23:27.654  2037  2037 E AndroidRuntime: 	at android.os.Parcel.createExceptionOrNull(Parcel.java:2429)
05-14 18:23:27.654  2037  2037 E AndroidRuntime: 	at android.os.Parcel.createException(Parcel.java:2409)
05-14 18:23:27.654  2037  2037 E AndroidRuntime: 	at android.os.Parcel.readException(Parcel.java:2392)
05-14 18:23:27.654  2037  2037 E AndroidRuntime: 	at android.os.Parcel.readException(Parcel.java:2334)
05-14 18:23:27.654  2037  2037 E AndroidRuntime: 	at android.content.om.IOverlayManager$Stub$Proxy.setEnabled(IOverlayManager.java:703)
[...]

In your case, the root of the problem could look similar - but there’s just no guarantee.

Skimming the output and even semi-automatically filtering it for strings (by using grep and friends) that indicate error conditions (”Fatal”, “Error”, “Exception”, “killed”, etc. could make sensible candidates) is a good way to narrow down the scope of this tedious analysis. Also, the fifth column of logcat output represents the log level/severity of the logged event, so filtering for Error or Fatal conditions could yield a lead - using awk, this would be done like so:

awk '{if ($5 == "E" || $5 == "F"){print}}' /tmp/logcat_*

Most of the time, however, there’s too much getting logged at severity Error for this to reduce the merely possible results to the very much likely ones.

At any rate, you may be able to find the proverbial smoking gun this way - but in some (if not most) cases, you will need to tap the knowledge of others, which is what I did to get my bootloop problem sorted, and all my phone’s data back.

Requesting community support

At this point, your best shot is to gather all the data like I hinted at above, and to intelligently engage with the support community of your Android distribution or custom ROM that you would want to fix on your device. Channels to do that could be any of Internet Relay Chat (IRC), an on-topic Subreddit, some bulletin board on the Word Wide Web, a discord server… it’s on you to figure that out.

Once you have found the appropriate venue, make sure to concisely yet completely state your problem, present the facts and the data collected (use a pastebin service like paste.debian.net to make it convenient for others to review it), and… wait, sometimes patiently, for volunteer community members to take note of your request. Once you have their attention and receive support, try your best to ask smart questions about how you should proceed to resolve the mess you found yourself in when this journey started.

Best of luck!

Copyright ©2022 Johannes Truschnigg

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Special thanks to LineageOS contributor LuK1337 for guiding me and helping track down the root cause of my bootloop problem, and for developing and merging the patch that actually fixed it.