Searching Images by Color
The following is the final project for my Computational Photography class.
Piximilar: Image by Color is a project based on the work of Idée Labs, where you search for images by color rather than by text. Piximilar offers a unique experience that allows web designers to define the color scheme on the drawing board and write the cascading style sheet (CSS) before picking out the images. Piximilar leverages Flickr’s vast free photo archive picking out only the images that Flickr deems “interesting”. Piximilar analyzes these pictures using an in-house algorithm that maps each pixel to color palette options, allowing pictures to be retrieved based on their color contents. The processing is abstracted away from the user and built behind a beautiful user interface, which is user-friendly, scalable, and fast.
One of the most important aspects of designing a website is finding the right images. Piximilar offers an innovative approach to finding these perfect images, allowing one to search through thousands of interesting images by clicking on a desired color. Piximilar can even combine up to four colors, providing designers with a new opportunity of finding interesting images that will match their website’s theme. Piximilar also adheres to businesses that rely on color combinations for branding purposes. Furthermore, unlike Google Image Search, Piximilar will only search through images with non-restrictive licenses – that is free to use, free to modify, free to profit from. As we will see, Piximilar is a very handy tool that every web developer and graphic designer should be equipped with.
My main motivation for taking on this problem came after I saw the work of Idée Labs and how they addressed this problem. I found their tool extremely useful and innovative. This cool web application coupled with my aspirations to be a profession web developer were the main motivating factors for me to build my own web application that finds images by color.
The underlying problem that I will address in this paper is how to do you find images that are similar, not according to the text around them, but by the colors these images contain. I will go through the three main steps needed to solve this problem- how to find images to index and analyze, how to derive meaning and find similarities from images without a textual context, and how to wrap these results in an interface that makes it useful for everyone.
This project is based off the work of Idée Labs. Idée Labs created the Multicolr Search Lab, which extracts the colors from over 10 million of the “most interesting” free photos on Flickr and allows you to search through the images by color. The picture below shows the interface they designed to leverage their visual similarity technology:
As you can see, you click on colors in the swatch palette to select the colors. In this case, I clicked on a green, yellow, red, and purple. The results represent interesting images that have the largest sampling of our selected colors from their ten million image database. Another company that has implemented searching images by color is Google. This feature is very new– in fact, they updated their Image Search to include this feature while I was working on this project. Google implements a search by color differently from Idée Labs. They offer the feature more as an additional filter for a textual search. So typically you would search for a word, say “bird” and then select a color, say “red”, to find only birds that are red. The picture below represents the results of this query:
Google operates on a much larger scale, analyzing billions of images indexed from the web. A “Find Image by Color” option is better as a filter for a text search in this case because their database is so vast that it would be nearly impossible to find interesting images given a set of colors.
The theories and assumptions I made going into the project were invaluable in producing the final product. I assumed that there would be a way of mining Flickr for free images that were also interesting to look at. This was a big assumption, but after looking at the Multicolr Search tool, I knew it had to be possible. The theory was that I would design a palette of colors or options for the user to choose from. Now these options were just a small subset of the possible RGB colors, but provided a good sampling of the entire set. Next I needed to go through each pixel of each image and map it to one of the colors in the color palette. I assumed that PHP, the scripting language I was using, would be capable of extracting RGB values from images and would be powerful enough to do this mapping. Next I needed a place to store these mappings, so that they could be leveraged in a web application. I assumed that MySQL would be a capable of storing over 20 million entries. I also assumed that MySQL would be able to perform calculations on these entries quickly in order to keep the interface interactive for the user. The theory was that since MySQL uses powerful set theory algorithms, it would be able to sort and multiply a vast number of rows very quickly, which was crucial for my project to function correctly. Furthermore if MySQL was capable of doing the heavy lifting, I assumed that the PHP script that is responsible for sending updates to the web application would run swiftly. My final theory was that if I obtained over 100,000 images, my application would have enough pictures to choose from that selecting images by color would work effectively.
This section will be divided into the three crucial steps required for this project: Getting the images from Flickr, analyzes the images, and displaying the results. The first step involved mining Flickr for images. Luckily, Flickr provides a way of doing this, by directing your code to interact with their API*. Flickr’s API provides a method called flickr.interestingness.getList, which returns a list of interesting photos for the most recent day or a user-specified date. This method has some options including the date, how many items per page, and which page number. I designed my script to grab the XML generated from the API using URL:
https://api.flickr.com/services/rest/?method=flickr.interestingness.getList&api_key=API_KEY&date=DATE&per_page=500&page=1The API_KEY is the key every Flickr developer has to apply for, and DATE is the date I specified. One of the limitations of the Flickr API is that you can only retrieve a maximum of 500 images per page for a specified date. I needed more than just 500 images— I needed thousands of images. The work-around was to write a script that would go through each date and grab the XML the API generated. So I wrote a script that went through January 1st 2004 to March 1st 2009 and grabbed the XML from 500 of the interesting photos on each of those dates. Since this approach can lead to a lot of unexpected results- I tested my script extensively to ensure that it could handle anomalies such as days that do not have 500 interesting images as well as when the script tries to access the XML from a day that does not exist (ie. February 30th). Getting the XML for each day is only the first step in acquiring the images. The XML allows you to generate a URL that directly links to the image. So for each day, after I generated the XML, the script would generate this URL and then save the image to the desktop. The XML generated looked like this:<photos page=”1″ pages=”5″ perpage=”100″ total=”500″> <photo owner=”9137439@N08″ secret=”ea5fa34246″ server=”3550″ farm=”4″ title=”Poolside Reflections” ispublic=”1″ isfriend=”0″ isfamily=”0″ />. I then used an XML parser to obtain each photo’s farm number, server number, id, and secret code. I plugged in the values into this URL:https://farmfarm.static.flickr.com/server/id_secret_s.jpg
The results turned out great, but there is still room for improvements. Perhaps the biggest improvement would be to automatically link each image to its location on Flickr. The best way of solving this problem is to save the owner information from the XML feed to the database, allowing you to generate images that link to the following:https://www.flickr.com/photos/owner/id Where both the id and the owner information are retrieved from the database. Another improvement would be to acquire more images. The Multicolr Search Lab has over 10 million images making their results much more impressive. The final improvement could target performance. The web application takes about two seconds to load the images after the user clicks on a color. If I were to organize the data in a clever way, perhaps by using a tree or graph, I might be able to squeeze some extra performance out of my application. Despite these areas that need improvement, I took a complicated problem and broke it down into steps that were each solvable to create an innovative web application that allows users to search through images by color rather than by text.
- Multicolr Search Lab: Flickr Set. Idée Labs. 2008. https://labs.ideeinc.com/multicolr/
- Google Image Search. Google. 2009. https://images.google.com/
- Flickr Interesting Photos. Yahoo. 2009. https://www.flickr.com/explore/interesting/
- PHP Manual. PHP. 2009. https://www.php.net/
- jQuery Documentation. jQuery. 2009. https://docs.jquery.com/Main_Page/