Docker design: exchange data between containers or put multiple processes in one container?2019 Community Moderator ElectionHow to list containers in DockerHow to remove old Docker containersRun a Docker Image as a ContainerExposing a port on a live Docker containerCopying files from Docker container to hostWhat is the difference between “expose” and “publish” in Docker?Copying files from host to Docker containerWhat is the difference between a Docker image and a container?From inside of a Docker container, how do I connect to the localhost of the machine?How do I pass environment variables to Docker containers?

How can I handle a player who pre-plans arguments about my rulings on RAW?

How to kill a localhost:8080

Wardrobe above a wall with fuse boxes

If there are any 3nion, 5nion, 7nion, 9nion, 10nion, etc.

Was it really inappropriate to write a pull request for the company I interviewed with?

Is divide-by-zero a security vulnerability?

Reason why dimensional travelling would be restricted

In which way proportional valves are controlled solely by current?

I encountered my boss during an on-site interview at another company. Should I bring it up when seeing him next time?

Where is the line between being obedient and getting bullied by a boss?

What is better: yes / no radio, or simple checkbox?

How can I highlight parts in a screenshot

Why use a Gaussian mixture model?

Would the melodic leap of the opening phrase of Mozart's K545 be considered dissonant?

Levi-Civita symbol: 3D matrix

What is the difference between a forward slip and a side slip?

"seeing as you don't know anyone but me" meaning in this context

A bug in Excel? Conditional formatting for marking duplicates also highlights unique value

How to substitute values from a list into a function?

It doesn't matter the side you see it

For the Kanji 校 is the fifth stroke connected to the sixth stroke?

Should we avoid writing fiction about historical events without extensive research?

Lock enemy's y-axis when using Vector3.MoveTowards to follow the player

Can a gentile pronounce a blessing for a Jew? Are there songs I can sing that will bring peace?



Docker design: exchange data between containers or put multiple processes in one container?



2019 Community Moderator ElectionHow to list containers in DockerHow to remove old Docker containersRun a Docker Image as a ContainerExposing a port on a live Docker containerCopying files from Docker container to hostWhat is the difference between “expose” and “publish” in Docker?Copying files from host to Docker containerWhat is the difference between a Docker image and a container?From inside of a Docker container, how do I connect to the localhost of the machine?How do I pass environment variables to Docker containers?










1















In a current project I have to perform the following tasks (among others):



  • capture video frames from five IP cameras and stitch a panorama

  • run machine learning based object detection on the panorama

  • stream the panorama so it can be displayed in a UI

Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



My thoughts regarding potential solutions:



  • shared volume. Potential downside: One extra write and read per frame might be too slow?

  • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

  • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










share|improve this question


























    1















    In a current project I have to perform the following tasks (among others):



    • capture video frames from five IP cameras and stitch a panorama

    • run machine learning based object detection on the panorama

    • stream the panorama so it can be displayed in a UI

    Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



    Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



    My thoughts regarding potential solutions:



    • shared volume. Potential downside: One extra write and read per frame might be too slow?

    • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

    • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

    Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










    share|improve this question
























      1












      1








      1


      1






      In a current project I have to perform the following tasks (among others):



      • capture video frames from five IP cameras and stitch a panorama

      • run machine learning based object detection on the panorama

      • stream the panorama so it can be displayed in a UI

      Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



      Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



      My thoughts regarding potential solutions:



      • shared volume. Potential downside: One extra write and read per frame might be too slow?

      • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

      • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

      Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










      share|improve this question














      In a current project I have to perform the following tasks (among others):



      • capture video frames from five IP cameras and stitch a panorama

      • run machine learning based object detection on the panorama

      • stream the panorama so it can be displayed in a UI

      Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



      Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



      My thoughts regarding potential solutions:



      • shared volume. Potential downside: One extra write and read per frame might be too slow?

      • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

      • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

      Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.







      docker architecture






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 8 hours ago









      creimerscreimers

      2,27811940




      2,27811940






















          2 Answers
          2






          active

          oldest

          votes


















          1














          Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



          If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



          If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



          • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

          • Each component reads a message from a RabbitMQ queue that names a file to process.

          • The component reads the file and does its work.

          • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

          In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



          This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



          Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






          share|improve this answer






























            0














            Alright, Let's unpack this:



            • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

            • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

            • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

            If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



            https://docs.docker.com/config/containers/multi-service_container/






            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55021273%2fdocker-design-exchange-data-between-containers-or-put-multiple-processes-in-one%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



              If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



              If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



              • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

              • Each component reads a message from a RabbitMQ queue that names a file to process.

              • The component reads the file and does its work.

              • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

              In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



              This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



              Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






              share|improve this answer



























                1














                Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                • Each component reads a message from a RabbitMQ queue that names a file to process.

                • The component reads the file and does its work.

                • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






                share|improve this answer

























                  1












                  1








                  1







                  Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                  If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                  If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                  • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                  • Each component reads a message from a RabbitMQ queue that names a file to process.

                  • The component reads the file and does its work.

                  • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                  In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                  This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                  Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






                  share|improve this answer













                  Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                  If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                  If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                  • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                  • Each component reads a message from a RabbitMQ queue that names a file to process.

                  • The component reads the file and does its work.

                  • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                  In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                  This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                  Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 8 hours ago









                  David MazeDavid Maze

                  14.8k31328




                  14.8k31328























                      0














                      Alright, Let's unpack this:



                      • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                      • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                      • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                      If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                      https://docs.docker.com/config/containers/multi-service_container/






                      share|improve this answer



























                        0














                        Alright, Let's unpack this:



                        • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                        • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                        • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                        If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                        https://docs.docker.com/config/containers/multi-service_container/






                        share|improve this answer

























                          0












                          0








                          0







                          Alright, Let's unpack this:



                          • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                          • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                          • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                          If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                          https://docs.docker.com/config/containers/multi-service_container/






                          share|improve this answer













                          Alright, Let's unpack this:



                          • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                          • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                          • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                          If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                          https://docs.docker.com/config/containers/multi-service_container/







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 8 hours ago









                          Subramanya VajirayaSubramanya Vajiraya

                          17210




                          17210



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55021273%2fdocker-design-exchange-data-between-containers-or-put-multiple-processes-in-one%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

                              Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

                              List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229