During my last two years of studies, I worked as alternant at an Amazon logistics center. I was in the IT team. There were many day to day tasks and I had a dissertation to write in order to validate my diploma. This dissertation was about a project given to me by Amazon.
As I worked in a logistics center, there were a lot of collaborators working from receiving packages from suppliers, to sorting, dispatching and sending client orders. There was the complete cycle. All of this requires a lot of IT equipment, that was used almost 24/7 by the collaborators.
By no means should the operations stop, so we were there to help them with everything failing, or that needed change of any sort. There was a ticketing web interface where managers could create requests when something was needed from us. Our team walked a lot because we were often helping people, and the center was huge (more than 40 000 square meters of surface).
Many users and managers means a lot of IT equipment. Computers, gun scanners, barcode scanners, monitors and whatnot, everything was identified with asset tags. We had an interface to manage the inventory of all our equipment. This was crucial for us so we knew where each piece of IT equipment was. We could assign it to a person or a zone, set date of return and other parameters.
We also had to make sure everything works on the software side. We had several audits of us checking if our stock of computers was up to date, and collaborators put defective gun scanners in a box we checked each morning. As a consequence, we sometimes had to reinstall Windows on computers, and Android on the scanners. We had provisionning interfaces to do this by batches of machines.
Thin clients were also a part of the IT equipment. These were small PCs running Tiny Core Linux, and there was a PXE server providing this OS to the machines. As it is a very lightweight OS (10MB) it could run directly from the network, no need to install it and collaborators could use it right away. We just had to install them on the stations and activate network boot.
Printer maintenance was crucial as well, especially label printers. They were used hours and hours per day and we were used to perform maintenance on them. The same goes for barcode scanners.
A lot of devices means a lot of data. We managed hundreds of Wi-Fi access points. All of them were from Cisco and the configuration was made via a wireless controller, wich is a physical machine that runs an operating system design to handle and configure these access points.
We had several cabinets of Cisco switches, and it was the team that provisionned them and configured them. Everything was going through the main server room. It was a room with two fiber connections from two different ISPs, several servers like print servers, file servers, monitoring, walkie-talkie server, camera server and others.
There also were several distribution switches as well as level 3 switches. Routers were also present behind the ISP links. Everything was redundant for fault tolerance. There was no single point of failure, which is very important when running critical operations.
Most servers were running Windows Server and others were running Redhat Linux. We performed the required maintenance tasks and updates on these machines. We also sometimes had to change the switches configuration like VLANs and such when moving equipment or servers.
Every single server and network equipment was backed up by UPS, which is a device that has a battery inside to maintain power to equipment when there is a power outage. The UPS were never at full load, neither at half so equipment could run longer in case of a power failure. I managed these UPS via network with the help of the engineer of the team.
I mentionned that we were used to changing equipment and machine configuration. Some of these tasks could be done immediately, but every tasks that would change the system or network configuration required us to write what was called a change management.
A change management is a paper we needed to write before doing something that could cause issues if not done properly. We would write every single action we would do, every single click and command typed needed to be described. We also had to write a rollback procedure in case something went wrong.
After this was done, we would submit our paper to engineers and managers that reviewed what we wrote, and told us if it was okay or not to perform the action. Writing these papers was sometimes tedious, but it helped me in knowing better how to think of every consequence an action on the network can have, and how to prevent unwanted behavior from a command or action.
The collaborators, in each team, had a whiteboard on which they could write whatever they want. Usually, they would write about what issues they have, or share their thoughts on how to improve something or some process.
Each morning, the managers and sometimes I walked to these boards to speak with the collaborators about what they wrote. My manager and I were taking only IT related requests or feedback. This helped us understand better how collaborators see things and work, and how we could help them.
For school and in order to validate my two years, I had to write a dissertation on a project that was given by my Amazon tutor. Unfortunately, the time I was there, there were no big projects that I could lead. For the dissertation, our school wanted us to manage a project from beginning to end. As it was not possible in my case, my tutor gave me a textbook case. He knew I liked monitoring, but that my experience with it was limited to small devices and equipment.
The project was to learn how to manage enterprise servers, and how I could use one to monitor other equipment that is very specific, like UPSes for example, things that differ from regular switches and routers. I had to manage a budget and buy the necessary equipment, make a pre-project study with a functional analysis, planning, and everything that is required to manage a project the best way possible.
After all the project management works were done and written down, I was ready to move on to the technical aspects of the project. I chose how to manage the power, the dual network interfaces, the hard disks and RAID type. This project helped me learn a lot about these three things, and in the end it even helped me with my personal projects. Once the operating system was installed on a well-prepared server, I had to choose, install, and configure a monitoring solution. Once again I went with Nagios because I wanted to learn even more about it.
Once the server was working properly, I ran some tests that I have also written down, and in the end I ended up with a well-written dissertation, for which I got an A grade. The oral presentation also went well, I got a B, which with all my other grades validated my diploma.