Jump to content

Very large catalog (3.4 million products) [SOLVED]


J. Jackson

Recommended Posts

I am working on a site for a customer with 3.4 million products. I am running into issue with memory and queries long before I get to the full catalog size. They are currently using a very old product called "Interchange" - its simple database structure combined with super simplistic page layout works, but it is a 30 year old design... it's time to get them up and running on a more modern platform.

I have written custom import libraries and can get the products properly imported. I have been increasing the catalog size in powers of 10 and when I got to 1 million items, things went pear shaped.

The product detail page (showing a single item) alone required me to bump the max PHP memory to 512MB / process. This seemed off to me, so I enabled the debugging and profiling.

With caching disabled, this page throws 450 queries at the database. With it enabled (which causes other issues with catalogs of this size), it still executes roughly 140 queries.  Some of the queries are pulling back a number of rows equal to the product count in the database.

The hardware needed is still being evaluated. If need be, I can put multiple headend servers talking to one of more dedicated MySQL servers. However, at 450 queries per item displayed and half a gig per process, search engine spiders are going to annihilate the site. I can't block web crawlers as they have nearly a $1 million / year in sales depending on organic search traffic.

Does anyone have any suggestions for optimizing PrestaShop for very large catalogs other than "throw more hardware at it"? I am trying to avoid having to do too awful much modifying of the core PrestaShop code as this would become a very expensive support issue for them. They have a decent revenue stream, but several employees and a warehouse full of parts - net profits don't really support massive IT spending.

Edited by J. Jackson
typo fixes, [solved] (see edit history)
Link to comment
Share on other sites

Posted (edited)

The general solution to the problem was the discovery of an inherent flaw in the "recently viewed items" module. It causes the module's memory usage to scale with the number of active items in the products table.

At 1 million items, the module needs >400MB of RAM. At 3.4 million, it will need ~1.5GB of RAM as the issue scales linearly. The module is pulling back a list of all of the active product IDs in the database 3 times.

Dropping the "recently viewed items" module as well as the "items in the same category" module which has similar but not as pronounced problems, the memory usage for a product display page dropped from 416MB to 5MB. In addition, the load time for the page decreased from 4sec to 19ms.

While I would certainly expect that the load times and memory usage would increase with the use of these 2 modules, they do not scale properly for large product catalogs. 

I will further address this issue in the development forums.

Edited by J. Jackson
solution (see edit history)
Link to comment
Share on other sites

  • J. Jackson changed the title to Very large catalog (3.4 million products) [SOLVED]
  • 4 months later...
On 4/25/2025 at 5:33 PM, J. Jackson said:

I am working on a site for a customer with 3.4 million products. I am running into issue with memory and queries long before I get to the full catalog size. They are currently using a very old product called "Interchange" - its simple database structure combined with super simplistic page layout works, but it is a 30 year old design... it's time to get them up and running on a more modern platform.

I have written custom import libraries and can get the products properly imported. I have been increasing the catalog size in powers of 10 and when I got to 1 million items, things went pear shaped.

The product detail page (showing a single item) alone required me to bump the max PHP memory to 512MB / process. This seemed off to me, so I enabled the debugging and profiling.

With caching disabled, this page throws 450 queries at the database. With it enabled (which causes other issues with catalogs of this size), it still executes roughly 140 queries.  Some of the queries are pulling back a number of rows equal to the product count in the database.

The hardware needed is still being evaluated. If need be, I can put multiple headend servers talking to one of more dedicated MySQL servers. However, at 450 queries per item displayed and half a gig per process, search engine spiders are going to annihilate the site. I can't block web crawlers as they have nearly a $1 million / year in sales depending on organic search traffic.

Does anyone have any suggestions for optimizing PrestaShop for very large catalogs other than "throw more hardware at it"? I am trying to avoid having to do too awful much modifying of the core PrestaShop code as this would become a very expensive support issue for them. They have a decent revenue stream, but several employees and a warehouse full of parts - net profits don't really support massive IT spending.

I find your experience really impressive and I’d like to know more about how you handled a catalog of this size. I once worked with a client that had a large product base, but nowhere near what you encountered. In our case we used an AWS structure with load balancing and auto-scaling to handle traffic peaks, and separate databases to better manage performance.

Could you please share:

Which hosting provider or infrastructure setup did you use to manage 3.4 million products?

Did you rely on a cloud solution (AWS, GCP, Azure) or a dedicated server environment?

How did you handle database performance and scaling, for example, did you use read replicas, caching layers, or search engines like Elasticsearch/Solr?

Did you have to implement any special load balancing or CDN strategies to cope with traffic and crawlers?

Link to comment
Share on other sites

Unfortunately, the decision was made a while back to drop PrestaShop for this project. The number of core code changes that were being made led to the realization that managing update patches against future versions of Presta was simply going to be too much effort and time. The customer has no full-time IT staff and the cost of having these updates performed at consulting rates is cost-prohibitive for them.

I still really like PrestaShop and have used it for some smaller sites and catalogs, but I've found it simply isn't suitable for this sort of site unless you have the time and resources to maintain large code changes against the core project. In addition, the hardware requirements for catalogs of this size do not work well for shops with tight budgets. I did submit a pull request to fix the memory usage flaw in the Viewed Products module which is still pending and has not been merged. Ultimately, this tends to indicate that scalability isn't of great concern with Presta as the fix is only a few lines of code.

The settled on solution will not be popular here, but is working out well. The decision was made to write a custom site for this customer. They have steep catalog requirements, but a rather simple site and business model with some very specific functionality requirements not found in generic e-Commerce solutions. The site was developed as a .Net, C# application using Razor pages and SQL Server as a data backend. I was comfortable going this route as I have over 20 years experience working in C# for enterprise systems and nearly 30 years experience with SQL Server. The number of hours spent on a custom, tailor-made solution was less than the amount of time being spent to modify Presta. The solution is hosted on AWS with load balancing achieved using custom built systems based on AWS Linux + HAProxy. Bot wrangling is performed by placing the site behind Cloudflare with throttles for known bots. In addition, when the system comes under heavy load (which is nearly always bad-actor crawler traffic), Cloudflare's API is used to automatically activate JavaScript challenges for areas of the site which invoke any of the site's search features (this is the heaviest load on the system, by far). Legit bots are never directed to these areas of the site, have exclusions in robots.txt, and the site makes use of nofollow tags on links which shouldn't be indexed.

The site was purpose built to allow for an arbitrary number of web front-ends. The load balancing and scaling solution was custom built instead of using AWS auto-scaling to save on costs.

In conclusion, the increase in software licensing costs was more than offset by the savings in hardware and time expenditures. Could this system have been developed using PHP or some other Linux based solution? Of course. It still would have been possible to use PrestaShop, but the complexity inherent in Presta's customizability and flexibility made the required modifications and hardware requirements cost prohibitive. 

Link to comment
Share on other sites

30 minutes ago, J. Jackson said:

Unfortunately, the decision was made a while back to drop PrestaShop for this project. The number of core code changes that were being made led to the realization that managing update patches against future versions of Presta was simply going to be too much effort and time. The customer has no full-time IT staff and the cost of having these updates performed at consulting rates is cost-prohibitive for them.

I still really like PrestaShop and have used it for some smaller sites and catalogs, but I've found it simply isn't suitable for this sort of site unless you have the time and resources to maintain large code changes against the core project. In addition, the hardware requirements for catalogs of this size do not work well for shops with tight budgets. I did submit a pull request to fix the memory usage flaw in the Viewed Products module which is still pending and has not been merged. Ultimately, this tends to indicate that scalability isn't of great concern with Presta as the fix is only a few lines of code.

The settled on solution will not be popular here, but is working out well. The decision was made to write a custom site for this customer. They have steep catalog requirements, but a rather simple site and business model with some very specific functionality requirements not found in generic e-Commerce solutions. The site was developed as a .Net, C# application using Razor pages and SQL Server as a data backend. I was comfortable going this route as I have over 20 years experience working in C# for enterprise systems and nearly 30 years experience with SQL Server. The number of hours spent on a custom, tailor-made solution was less than the amount of time being spent to modify Presta. The solution is hosted on AWS with load balancing achieved using custom built systems based on AWS Linux + HAProxy. Bot wrangling is performed by placing the site behind Cloudflare with throttles for known bots. In addition, when the system comes under heavy load (which is nearly always bad-actor crawler traffic), Cloudflare's API is used to automatically activate JavaScript challenges for areas of the site which invoke any of the site's search features (this is the heaviest load on the system, by far). Legit bots are never directed to these areas of the site, have exclusions in robots.txt, and the site makes use of nofollow tags on links which shouldn't be indexed.

The site was purpose built to allow for an arbitrary number of web front-ends. The load balancing and scaling solution was custom built instead of using AWS auto-scaling to save on costs.

In conclusion, the increase in software licensing costs was more than offset by the savings in hardware and time expenditures. Could this system have been developed using PHP or some other Linux based solution? Of course. It still would have been possible to use PrestaShop, but the complexity inherent in Presta's customizability and flexibility made the required modifications and hardware requirements cost prohibitive. 

It is true that the initial cost of developing a platform from scratch may be lower than modifying an existing PrestaShop setup. However, when additional features are needed, such as a highly effective Facebook Pixel integration, building them from the ground up could become very costly. In contrast, PrestaShop offers ready-made modules for these needs, and the same applies to marketplace integrations like Amazon or others.

This means that while the upfront development cost may be lower for a custom solution, the ongoing development and maintenance costs could end up being significantly higher.

That said, you do raise a valid point. For instance, we currently run stores on PrestaShop 1.7 and 8.2 hosted on AWS, and performance has not been ideal. We also faced considerable challenges migrating from a traditional hosting provider to AWS, which revealed some of PrestaShop’s limitations.

Thank you again for sharing these detailed insights. They highlight the fact that if a business opts for a custom solution, every future feature will need to be developed from scratch, which may or may not align with their expectations and budget.

Link to comment
Share on other sites

For various reasons those large shops run on borrowed time.
The larger the site becomes, the more attention those need to keep running. Updates bekomme a gamble.  You'd need an IT Staff to handle it.
Spread the products into 20 and the respective shops become more manageable. Easier to move. You also need IT Staff but not too many.
 

Link to comment
Share on other sites

The site I am working with has a large catalog of parts, but the site isn't all that large or complex. They have also been online since 1997. This project is merely an attempt to modernize the site a bit. Their old site design hadn't changed in literally 25 years.

It's a small engine and outdoor lawn equipment site selling replacement parts. It would make zero sense to split the parts catalog into multiple shops.

Link to comment
Share on other sites

On 8/31/2025 at 3:41 PM, J. Jackson said:

It's a small engine and outdoor lawn equipment site selling replacement parts. It would make zero sense to split the parts catalog into multiple shops.

As long as you can handle that ok. 
For businesses who wish to expand more shops bring more revenue. You can optimize articles and search-terms instead of having a catalogue online. 
Servers close, hosting faults and such making smaller shops easier to move.  

Link to comment
Share on other sites

Something that is almost always overlooked with very large catalogs is the basics:

o Hosting – If you’re still on old hardware, you’re leaving performance on the table. NVMe SSD hosting is 10x faster than the drives most older shops are stuck on.

o MySQL tuning – Most hosting packages ship with a bare-minimum my.cnf config that isn’t tuned for anything. A properly tuned database config often makes a bigger difference than any “performance” module or caching layer (which rarely works well on dynamic pages anyway).

I’ve seen shops with millions of products go from crawling to flying just by moving MySQL onto its own NVMe box and tuning the config. Ta-dah.

And honestly… if your agency or freelancer isn’t even bringing up hosting, you should probably find another party, loool.

This blog article may help someone.

Link to comment
Share on other sites

I appreciate the suggestions, however disk I/O wasn't any part of the issue. Any software which fires 450 queries at the database server and consumes 1.5GB of RAM per visitor to display a product detail page has flaws that storage performance and database configuration tweaks won't solve. This was the results of the profiling done against the default install of PrestaShop with their catalog loaded. I submitted a pull request to deal with some of the initial memory issues I ran into, but the query counts are inherent in the way Presta's code is written. Some of these issues can be partially mitigated by reducing the functionality of the site, but that was honestly the initial appeal of using Presta.

In this case, I am the agency working with the customer to update their site / software. I have years of experience in database administration, working with systems much larger than 3.4 million records. Some of my other customers have databases with billions of records and terabytes of data. They, however, are in the tech industry, not the lawn equipment industry.

This was merely a test case for PrestaShop as it might have been a simple solution for their shop. The test systems used to initially profile the software had NVMe storage, hundreds of gigabytes of RAM available, and each test instance could be allocated dozens of cores if needed. However, the target hosting platform can not contain these sorts of specs as it is cost prohibitive to allocate such resources... unless self or co-locating the hosting. They do not have the technical resources for this.

As mentioned previously, the solution ultimately chosen was to create a custom site. The resulting code and software have no problems querying the customer's catalog on much more modest hardware platforms.

This is not a knock on PrestaShop. As I mentioned, I really enjoy the software and have used it for other projects. It just wasn't a good fit for this particular customer's needs. There is no such thing as a truly one size fits all e-Commerce solution. Aside from the size of the catalog, their needs included things like large supersession chains, product substitution lists, and the ability to lookup no longer available parts for cross referencing. These are not part of Presta's base functionality and would have required additional custom modules to be written. There were simply too many modifications needed to make it suit the purpose of the site.

Edited by J. Jackson (see edit history)
  • Like 2
Link to comment
Share on other sites

6 minutes ago, J. Jackson said:

I appreciate the suggestions, however disk I/O wasn't any part of the issue. Any software which fires 450 queries at the database server and consumes 1.5GB of RAM per visitor to display a product detail page has flaws that storage performance and database configuration tweaks won't solve. This was the results of the profiling done against the default install of PrestaShop with their catalog loaded. I submitted a pull request to deal with some of the initial memory issues I ran into, but the query counts are inherent in the way Presta's code is written. Some of these issues can be partially mitigated by reducing the functionality of the site, but that was honestly the initial appeal of using Presta.

In this case, I am the agency working with the customer to update their site / software. I have years of experience in database administration, working with systems much larger than 3.4 million records. Some of my other customers have databases with billions of records and terabytes of data. They, however, are in the tech industry, not the lawn equipment industry.

This was merely a test case for PrestaShop as it might have been a simple solution for their shop. The test systems used to initially profile the software had NVMe storage, hundreds of gigabytes of RAM available, and each test instance could be allocated dozens of cores if needed. However, the target hosting platform can not contain these sorts of specs as it is cost prohibitive to allocate such resources... unless self or co-locating the hosting. They do not have the technical resources for this.

As mentioned previously, the solution ultimately chosen was to create a custom site. The resulting code and software have no problems querying the customer's catalog on much more modest hardware platforms.

This is not a knock on PrestaShop. As I mentioned, I really enjoy the software and have used it for other projects. It just wasn't a good fit for this particular customer's needs. There is no such thing as a truly one size fits all e-Commerce solution. Aside from the size of the catalog, their needs included things like large supersession chains, product substitution lists, and the ability to lookup no longer available parts for cross referencing. These are not part of Presta's base functionality and would have required additional custom modules to be written. There were simply too many modifications needed to make it suit the purpose of the site.

I' not knocking PrestaShop, I'm knocking the fact that 'most' do not tune their mysql...or upgrade their hosting to overpower their prestashop.  I also have previous experience before PrestaShop in large scale IBM systems, loosely tightly coupled capable of running 1'000's of messages a second.  but it's the same on any platform, the cpu is waiting for data...I am perfectly confident that I can significantly speed any PrestaShop, it's one of my favorite things to do.  I will say since I entered the www, it's pretty barbaric loool.

Things changed when fasthost uk started offering 'very' affordable dedicated nvme ssd solutions.  Before them, you  had dedicated cost then the pirates were charging 500 for nvme, but you could only have one because having 2 was a fire hazard.

Fun to discuss, but I'm perfectly confident that if client came in with this issue I could solve without a writing custom solution simple because what happens when they need new feature,  i.e.  some new payment gateway etc?  Anyway....off to finish consolidating 293 parent categories into 86 and improve visitor drilldown using filters, the main reason to choose ps is filtering, you may want to add that to the custom solution. 

Link to comment
Share on other sites

Just fyi - the "Recently Viewed" module fetches the entire list of product ID's (no filters involved) into RAM 3 times and does a for-loop style walk through looking for matching ID's from the cookies. This is the memory flaw I mentioned and uses an undue amount of CPU time. With 3.4MM items, this required upwards of 1.5GB of RAM per visitor to display a list of a half dozen recently viewed items. Thanks to bot traffic these days, it is not uncommon to have dozens of simultaneous requests being made against the site which offer little to no return value. Cloudflare catches most of these, but it can't catch them all. The fix was simple - adding an IN clause to the select for IDs, but still required custom modifications.

The 450 queries (reduced to around 150 with caching enabled) the software executes are mainly to determine all of the feature, layout options, and customizations a site uses. This issue is not related to catalog size.

Edited by J. Jackson (see edit history)
  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...