Posts Tagged ‘Hardware’
What is big data?
It’s actually quite simple: You have big data whenever a single host isn’t enough to either store or process your data.
What does it mean?
Suppose you have a Postgresql Database and you run into scaling problems. There’s a choice now, either get better hardware so that you can continue to work on a single host or split the database to span multiple hosts.
Suppose you have a file store and all disks are full. You can either buy larger disks or use some distributed storage system where you just add hosts to expand the total storage capacity.
In both cases you are dealing with big data.
Big data (for me) isn’t anything that says X MB of data. It’s simply the case when you need decide to use a distributed system to handle your data.
Monitoring Thoughts
Monitoring Thoughts
How would you scale monitoring and how would you ensure that with hundreds of thousands of events per minute you’ll still get the important ones?
A lot of stuff is missing here. This is merely a note how I think a scaling architecture for monitoring should look like. Also one should be able to do math on the events!
On Agents
RULE: events generated by Agents are stateless
- Run a monitoring agent on each node!
- Each agent performs a number of tasksThese are specifically called tasks since those are not necessarily checks. Also I associate checks with nagios checks. It’s not what we want to do!
A task does one thing, and one thing only:
- do not create tasks that are what NimSoft does (CMD – CPU/Memory/Disk)
- Each tasks generates an event
- Everything is an event!
- A successful task just a taks that ran without (programmatical errors)
- A failed task is something where a programmatical error occured!
- Create a JSON String from the event
- Submit the JSON string to some messaging middleware (preferrably RabbitMQ)
On Middleware
- Messagesmust be persistent
- It is safe to restart the server!
- What are just messages to the middleware are the guts of the system.Those are the events generated by agents
On Servers
There are 2 kinds of servers:
- PersistenceServersThese run somewhere in a rack. They will grab one event after another from a queue and store them in a safe place for later reference.
Once a Persistence Server grabbed an event from the queue it is no longer visible to other servers. Each event will reside on and exatly on Persistence Server.
- NotificationServersThese run either on physical serves in a rack and just grab one notification at a time from the messaging middleware.
- All notification server can retrieve all events.
- Notification Servers can subscribe to a certain subset of topics.
- There may be a lot of servers. We don’t want our monitoring failing
On Persistence Servers
- Subscribe to the global queue
- Start grabbing events
- Store the event on diskWhat exactly storing means is yet to be determined!
- Start over again
On Notification Servers
- Subscribe to the notification queue or a topic queue
- Start grabbing events
- Display the eventWhat exactly displaying means is yet to be determined!
- Start over again
