Is conda Free?
data:image/s3,"s3://crabby-images/71f14/71f148e6e73d4ec2fcc05ae64034ad64962dbc8d" alt="Blog Header template (8)"
Editor’s Note: Since this article was published, we have submitted this change to the Miniconda installer. We have made it simple to disable this data collection if you choose.
Anaconda strives to continually improve the user experience for our customers and communities. We aim to be the premiere provider of secure access to thousands of Python and R repositories, packages, and libraries, while also supporting the open-source community that powers those packages. These dual goals are reflected in Anaconda’s products and in our open-source contributions.
A deeper understanding of repository usage patterns enhances our ability to serve both our free and paid users. With that in mind, we are expanding the anonymous usage data that the conda package manager delivers when used alongside these Anaconda products: Anaconda Distribution, and Anaconda Navigator.
You will not be affected by this change if you rely exclusively on community channels like conda-forge and installers like Miniforge. Mindful of our commitment to the community, we will not submit this change to the conda project, the conda package itself, or the Miniconda installer. We have also made it simple to disable this additional data collection, if you choose.
In this article, we detail what data is being collected, what additional data will be collected, where and when this update is happening, why we are now collecting this additional data, how this data will be used by Anaconda, and how it will benefit our customers and communities. The article finishes with a deep dive into how user data is managed in Anaconda products.
When a conda client requests a package or an index from a repository, it uses an industry-standard mechanism to provide generic identification information into the request:
With our change in place, three randomly generated tokens are added to each request:
The details of how this works are given in the “Deeper dive” section below. But as indicated above, the tokens are random: they contain no personally identifiable information—not even the name of your conda environment. They do, however, help us draw better statistical conclusions about usage by allowing us to more precisely distinguish between distinct users, environments, and transactions in our access logs.
This update is accomplished using a new conda plugin implemented in the new anaconda-anon-usage conda package. This package will be added to Anaconda products in phases to help ensure a smooth rollout:
As explained in the introduction, this package is not being added to conda itself. Our intent is to collect data associated specifically with users who engage with Anaconda products, and not the larger open-source community, whose members may prefer to rely entirely on community-driven resources.
This additional data will help Anaconda serve both our community and our customers better, in a variety of ways.
On the community side, we are always looking to improve our ability to understand usage patterns in lightweight, privacy-preserving ways. This additional information allows us to perform analyses with much more accuracy by disaggregating, or separating, the raw usage data across users, transactions, and environments.
Here are just a few examples of questions we will be able to answer better:
It is our commitment to our community to find ways to share these insights with you. Specifically, we are looking at ways to provide this information to Anaconda channel owners and package developers, likely with an expansion of our existing condastats project.
Of course, our ability to invest significant resources into the conda community rests on the success of our revenue-generating products as well. To that end, we acknowledge that there are several ways that this data will help us improve our business. For instance:
For more information on how Anaconda generates revenue to support our business operations and continue our community investments, see our recent blog post, “Is conda free?”
We certainly hope that you will consider leaving the random tokens in place. But if you do wish to disable them, you can do so by running this command:
conda config --set anaconda_anon_usage off
You may also manually edit your conda configuration file and add the line:
anaconda_anon_usage: false
To re-enable the additional usage data, run this command,
conda config --set anaconda_anon_usage on
or remove the anaconda_anon_usage entry from your configuration file. No matter what you choose, your choice will remain in effect—even if you uninstall and reinstall Anaconda, as long as you do not delete your conda configuration file when doing so.
We’re grateful for the trust our users have placed in us. As always, with every change we make to our solutions, we are working to serve you better and continue our work to provide centralized, secure access to thousands of Python and R repositories, packages, and libraries. We will continue to champion the data science community and steward open-source projects that make it easier for you to innovate, build, and deploy effective solutions in your field.
In this section, we offer a more technical introduction to the mechanism by which both conda and anaconda-anon-usage determine and transmit the user data discussed above.
The key mechanism for transmitting this user data is through the industry standard HTTP user agent string. Your web browsers transmit these strings along with every request they make to a website. It generally contains information about the computer, operating system, and browser; for example:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Conda uses the same protocols to make requests of package repositories, so it, too, generates user agent strings. Here is a typical example:
conda/23.7.3 requests/2.31.0 CPython/3.10.12 Darwin/22.6.0 OSX/13.5.1
As you can see, this string contains information about the versions of various components of your operating system and conda environment. You can see the precise string that your conda environment uses by running the command conda info and examining the user agent line. Alternative conda clients such as mamba and pixi use very similar user agents as well. HTTPS encryption ensures that the content of these headers is protected from snooping.
When anaconda-anon-usage is installed, it uses the conda plugin mechanism to augment the conda user agent string. Specifically, it adds a new version token and three additional, randomly generated tokens. This longer user agent string might look like this:
conda/23.7.3 requests/2.31.0 CPython/3.10.12 Darwin/22.6.0 OSX/13.5.1 aau/0.3.0 c/16lUJyi7R8u-Co33mZJElQ s/YYFCctOeTjyDnXLazjLy_A e/rVB0_HxgRXKPLzKt9sKcVA
Here is what each of these tokens means:
So for instance, if you run the commands ‘conda install -n base pandas’ and ‘conda install -n child1 panel’ on the same machine, the user agent strings would have the same client token, but different session and environment tokens.
Each token is generated in the same way:
~/.conda/aau_token, $CONDA_PREFIX/etc/aau_token
%USERPROFILE%\.conda\aau_token, %CONDA_PREFIX%\etc\aau_token
If you delete the saved tokens, a new set of values will be regenerated automatically during the next conda transaction—unless you disable this telemetry altogether.
The anaconda-anon-usage package is, of course, open source, with a standard BSD license. The source code is available publicly if you seek an even deeper dive!
Talk to one of our experts to find solutions for your AI journey.