Tuesday, January 3, 2012

Cleaning Up Maven Dependencies

I'm currently working on a rather large Java code base which is built with Maven. It has million lines of Java, 190 Maven modules and 12 web applications. There is a lot of history with a code base like this. For a while I have had a feeling that we probably have some extra Maven dependencies lying around. This happens very easily over time when code is modified and moved around. Also resolving dependency issues are usually solved by adding more dependencies. Very rarely I've seen anybody removing dependencies from a working build.

I had some spare time so I decided to check how many extra dependencies we have and maybe try to remove them. The end result of this journey was quite surprising to me: from over 3000 dependencies that we had I could remove over 1500 without breaking anything. Removing extra dependencies has positive effect on many levels (build time, IDE development, complexity, ...) so if you have a big Maven build somewhere you might want to do the same. I wrote some scripts on the way and decided to open source them (https://github.com/siivonen/maven-cleanup). I also decided to write following instructions for anyone who want's to clean up their Maven dependencies.

0. You can get the scripts to your machine with the following command

    git clone git://github.com/siivonen/maven-cleanup.git

1. Make sure that the build works on your machine

For starters you need a green baseline. Figure out a build command that will include all the necessary tests etc in our project. We need a Maven command that is used to determine if everything is OK after a dependency removal. You save yourself time when you select a command that runs all the tests and activates all the relevant profiles. The command you choose should have 'install' in it. The default selection is 'mvn clean install'

Once you know which Maven command you want to use, run a full build using this command on your machine. This build needs to succeed so make all the required tasks to get the build green. In big Maven builds this can be hard sometimes. It is important that every Maven project is built successfully locally before you continue. This way your local repository will have all the needed artifacts and we know that at least before any dependency removals the build was green.

2. Systematically remove dependencies not needed directly

This is rather time consuming and mechanical task so I wrote a script for this. The script takes a root pom.xml file and Maven command as parameters. What it does is following:
  • Find all sub modules of given pom.xml file
  • For every pom.xml found remove one dependency at a time and build the project with Maven given command
  • If the build is successful leave the dependency out
The full build time in my project is 1,5 hours and I wanted to eliminate unneeded direct dependencies so I decided to make the builds non recursive.

  ./remove-extra-dependencies.rb pom.xml 'mvn clean install -N'

The script took 14 hours to loop all our 3000+ dependencies in 190 pom.xml files. After the script execution there were 1600 dependency removals to commit. Unfortunately committing those would have resulted in build failure. Removing a dependency from a project will cause build failure on dependent projects that used to get the removed dependency transitively. That's why we often need step 3.

3. Add the missing dependencies

The missing dependencies are something that a project used to get transitively but not anymore. Scripting this task seemed too complex so I decided to do this manually. The process was pretty much the following:
  • Run full build command (the one you ran in step 1)
  • If the build fails for missing classes:
    • search the missing dependency from your local repository
      • grep my.missing.Class ~/.m2/repository -r --include *.jar
    • add the dependency to failing Maven module (if the test compilation/execution is failing the test scope is enough for that dependency)
    • resume the build
      • mvn clean install -rf :failing-module
  • Repeat this until you get green build
After this step you should have less dependencies but still green build and working software.

4. Remove extra dependencyManagement entries

After removing a lot of dependencies you probably have several extra dependency management entries. Removing them is again rather mechanical so I scripted that. The script takes a pom.xml path as parameter and:
  • Searches all dependencies of the given module and all it's sub modules
  • Loop through all the dependency management entries and remove the entries that are not found in the group of all dependencies
You need to run the script separately for all pom.xml files that have dependencyManagement section.

    ./remove-extra-dependency-management.rb pom.xml

This script doesn't need to run any Maven builds so the execution time is rather fast. You might get a build failure after running this script also. The script removes all the dependency management entries that are not referenced directly. You can use dependency management to control transitive dependencies of 3rd party libraries that you are not directly referencing. Also your build might rely on a dependency management exclusions that are now removed. If you have these build problems you can solve them case by case in the similar fashion as in step 3. Once you get a green build after this step you are done!

14 comments:

  1. sounds cool and ~~ handy sort of

    ReplyDelete
  2. You are a hero. I happen know this particular project well enough to say there's a lot of mess to be cleaned up. Next write a script that will re-organize all the code into the optimal structure :)

    ReplyDelete
  3. Great work!! What about using mvn dependency:analyze in step 2?

    ReplyDelete
    Replies
    1. That could work in some projects. To my understanding it only reports dependencies referenced directly from code so it equals to running 'mvn test-compile' successfully. In our project that would have resulted in a lot of false positives (all the reflection and dependency injection cases, cases that are caught only in integration or browser tests, ...) and therefore a long step 3.

      Delete
  4. Thanks Samuli, I'm giving this a go now. A couple of things to note:

    1. I'm running on Windows (not by choice), and if anyone else wants to give it a go on Windows it is pretty easy to get the script to run by changing the /dev/null redirection

    2. Sometimes parent modules (or assemblies) in Maven will contain dependencies for the sake of simplifying their descendants - this is the case in my current project. If we run the script as-is, it will remove all dependencies from the parent (structural) pom, thereby breaking most if not all children.

    For example, we have a library of handy unit testing utilities, and this is included at the top level to avoid repeating this in each project. If the cleanup script removes this it breaks all child projects (and the analysis therein will not work).

    My solution to this was to add some code to skip the dependency removal step if the pom contains pom, and this works for our assembly projects as well. I don't really know Ruby, so you could probably do this more elegantly than I can, but I'll share my changes back if you wish.

    ReplyDelete
    Replies
    1. This sounds like OK solution and it's simple. Another solution that comes to my mind is to run the script separately to parents and children with different Maven command. Parent builds will be recursive and child builds will not. This is more complex and time consuming but results in removing the unused dependencies from the parent projects also.

      If you have any improvements to the scripts just ship it in. To me content is more important than code.

      Delete
    2. how do i change /dev/null redirection? Can you be more specific since I know nothing about Ruby? thanks

      Delete
    3. I don't have Windows to try this but maybe this helps you: http://stackoverflow.com/questions/313111/dev-null-in-windows. You can also try to install Cygwin on your Windows and run the scripts in Cygwin console without any modifications.

      Delete
    4. Change Line 53 to if system("cd #{File.dirname(pom)} && #{ARGV[1]} >nul 2> nul")

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I was wondering, if the manual step 3 is needed, if you would execute the command "non-recursive" in "inverse order" of dependencies in a multi-module build.

    So you start by your "leave module(s)", without any dependencies to other modules, and go up the tree and do module by module.

    To automate the whole thing then, there is probably a way to "detect inter module dependencies" automatically, and sort the cleaning accordingly.

    Regards,

    Carsten

    ReplyDelete
  7. This sounds like a good idea. You could probably get a good build order by running 'mvn validate' and reading the 'Reactor Build Order:' from the output.

    ReplyDelete
  8. It is a great piece of work. Saved me a lot of hassle and time. Thank you very much for sharing it.
    Please don't mind me asking, is there a way to detect deployment time errors? The build works fine after cleaning but I see deployment time errors :-(

    ReplyDelete
    Replies
    1. Thanks! Deployment time dependencies are harder. Haven't figured out a good solution to that. Trying to cover all the execution paths of live application in the 'mvn test' phase helps (you can use the method described in this blog). The "tests" can just trigger production code and check that no errors are happening.

      If you have good browser test set you can maybe use that. Remove one dependency at a time, run the browser test set and check if everything is working and log is clean.

      Delete