Projects

A showcase of my recent development work and technical projects.

Project

Github Issue Creator: How I keep LLMs on a tight leash

The way I keep LLMs on a tight leash is through structured issue breakdowns. In this post you'll see how I go from concept to issue breakdown to GitHub issues to code - and why that sequence makes it easier to run an orchestrator agent directing a focused executor agent without it hallucinating scope. It starts with a planning doc: a problem statement and a list of items dissecting the solution. Below you'll see how to turn that into GitHub issues with labels, milestones, and acceptance criteria - without copy-pasting each one by hand. You copy-paste the first five, by issue eight you are making typos... by issue fifteen you have a duplicate and by issue twenty you have stopped caring about the body format entirely. I was refactoring a multi-model NLP sentiment classifier from a flat `src/` folder into a proper package structure - eight packages, 30+ issues, 3 phases. Every time the plan changed I was back to copy-pasting so I decided to write a 300-line Python CLI to end that. The result: [`gh-issue-creator`](https://github.com/lfariabr/gh-issue-creator). Give it a JSON template or a plain markdown planning file, it discovers your repo automatically, runs dry by default and even skips issues that already exist. One command, done. --- ## The problem Most developers plan in text. GitHub Issues wants structured input via a web form or API calls: the gap between those two surfaces is where time disappears. The GitHub CLI (`gh`) already handles authentication and API calls cleanly. The missing layer is a thin converter that takes a batch of planned issues — from wherever they live — and creates them with one command. The constraints I set: 1. **Dry-run by default.** Never touch the API without explicit intent. 2. **Read markdown natively.** My planning docs are already in `### Issue #50 - Title` format. The tool should parse that directly. 3. **Skip duplicates by title.** Running the script twice should not create duplicates. 4. **Zero config on a correctly authenticated machine.** `gh repo view` already knows the repo. 5. **Fail loudly and early.** Validation errors before any API call, not halfway through a 30-issue batch. This structure matters even more when the executor is an AI agent. A well-scoped issue with a clear Acceptance section is a bounded task. A vague one is an invitation for the agent to hallucinate scope. The tool forces you to write tight issues before you run anything. --- ## How it works ### Two input formats, one output The tool accepts JSON or markdown. The JSON format gives you full control: labels, assignees, milestones per issue. The markdown format is a convenience parser for when your plan already exists as a doc. The markdown parser reads headings like `### Issue #50 - Your title` and treats everything below each heading as the issue body: ```python pattern = re.compile( r"^###\s+Issue\s+#(?P<number>\d+)\s*-\s*(?P<title>.+?)\s*$" r"(?P<section>[\s\S]*?)(?=^###\s+Issue\s+#\d+\s*-|^##\s+|\Z)", flags=re.MULTILINE, ) ``` The lookahead (`(?=^###\s+Issue\s+#\d+\s*-|^##\s+|\Z)`) stops at the next issue heading rather than a fixed delimiter — because planning docs don't have consistent separators. If you pass a `.json` path that doesn't exist but a sibling `.md` file does, the tool uses it silently. The template path is the default argument and markdown is the natural planning format. ### Deduplication before creation `_existing_titles` fetches up to 1,000 issues from the target repo before creating anything: ```python def _existing_titles(repo: str) -> dict[str, int]: data = _run_gh([ "issue", "list", "--repo", repo, "--state", "all", "--limit", "1000", "--json", "title,number", ]) issues = json.loads(data) return {issue["title"]: issue["number"] for issue in issues} ``` One API call up front, O(1) dict lookup per issue. If the title already exists, it logs the existing issue number and skips. No duplicates even if you run the script twice. ### Dry-run is the default `--create` must be explicitly passed. Without it, the script logs every planned issue with its metadata and exits without touching GitHub: ```python if not args.create: details = [] if labels: details.append(f"labels={labels}") if assignees: details.append(f"assignees={assignees}") if milestone: details.append(f"milestone={milestone}") suffix = f" [{', '.join(details)}]" if details else "" planned.append(f"- {title}{suffix}") continue ``` The first time I ran a batch tool without a dry-run step and created five duplicate issues on a test repo, I added it immediately. --- ## Lessons learned 1. **Dry-run by default is not extra engineering.** It is the minimum viable trust for any tool that writes to a shared API. 2. **Markdown is a planning format. JSON is an API format.** Tools that force you to convert your planning format into an API format before they will accept input are adding friction, not removing it. Meet engineers where they already are. 3. **`subprocess.run(check=True, capture_output=True)` is the right pattern for CLI composition.** Let the subprocess fail loudly. Catch `CalledProcessError` at the call site and print a clean error. Don't swallow it. 4. **Delegate to existing tools.** `_discover_repo()` is three lines because it just calls `gh repo view`. Reimplementing remote detection would have been twenty lines and two edge cases I hadn't thought of. 5. **Tight issues are better context than long prompts.** When you hand an executor agent a well-formed GitHub issue — Goal, Scope, Acceptance — it has less room to drift than when you paste a paragraph of instructions. The discipline of writing issues before coding pays double when the coder is an LLM. --- ## No install required The script uses only Python stdlib — `argparse`, `json`, `re`, `subprocess`, `pathlib`. No `pip install`, no virtualenv. ```bash curl -O https://raw.githubusercontent.com/lfariabr/gh-issue-creator/main/issue_creator.py curl -O https://raw.githubusercontent.com/lfariabr/gh-issue-creator/main/examples/template.example.md # dry-run python issue_creator.py --template template.example.md --repo owner/your-repo # create python issue_creator.py --template template.example.md --repo owner/your-repo --create ``` --- The repo is at [github.com/lfariabr/gh-issue-creator](https://github.com/lfariabr/gh-issue-creator). The README includes three AI agent prompts for delegating the entire workflow — from writing the plan to creating the issues — to a coding agent. That's the loop: plan in markdown, issues on GitHub, agent on a tight leash. Clone it, run a dry-run, open a PR if you extend it to handle sub-tasks or GitHub Projects boards.

pythongithubproductivityopensource

17/05/2026

GitHub View Details

Project

My portfolio fetches NASA's Daily Space Photo - and never fails!

I integrated NASA's Astronomy Picture of the Day *([read about it](https://api.nasa.gov/))* into my portfolio. > * *SPOILER ALERT: Contains rate limiting, fallback scraping, modular architecture, and production-grade error handling that never leaves users hanging.* --- ## The Vision: Bringing Space to My Portfolio My portfolio ([luisfaria.dev](https://luisfaria.dev)) runs a full-stack MERN application with authentication, a chatbot, and a GraphQL API. I wanted to add something unique — something that would genuinely delight users while showcasing real-world API integration skills. Between terms of my Master's Degree, I had a few weeks off. Perfect vacation project, right? BTW, I'm open-sourcing the whole thing — check it out! [mastersSWEAI repo](https://github.com/lfariabr) **The idea:** A floating action button that reveals NASA's daily Astronomy Picture of the Day (APOD). Simple concept, complex execution. ### The User Experience ![NASA APOD Floating Action Button](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6m40xt1v6lazjqqq3phl.png) *Click the NASA rocket button: 👉 [luisfaria.dev](https://luisfaria.dev)* - **Anonymous users:** Get today's APOD instantly — no login required - **Authenticated users:** Browse NASA's entire archive dating back to 1995 - **Rate limiting:** 5 requests/hour per user to protect the NASA API quota - **Resilience:** If NASA's API fails, automatic HTML scraping fallback kicks in Here's exactly what happens when someone clicks that rocket button: ![NASA APOD Mermaid Diagram](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f1rfjhbia1le0zoowa2z.png) > 👉 *[See the image in HD](https://github.com/lfariabr/luisfaria.dev/tree/master/_docs/devTo/t1_2026/img/apod_flow.jpeg)* --- ## The Challenge: External APIs Are Unreliable Integrating third-party APIs sounds straightforward — until reality hits: | NASA API Reality | Production Requirements | |-----------------|------------------------| | **Rate limits** (1000 req/day) | Must protect quota, gracefully throttle users | | **504 Gateway Timeouts** | Can't show users blank screens | | **Validation issues** | NASA sometimes returns `media_type: "other"` with no `url` field | | **Network failures** | ETIMEDOUT, connection refused, DNS issues | | **Schema drift** | NASA API evolves independently of your code | **The goal:** Build an integration that: 1. Handles every failure mode gracefully 2. Never crashes the server 3. Falls back automatically when NASA is down 4. Logs everything for debugging 5. Provides structured errors to clients Spoiler: NASA's API went down during development. More than once. --- ## The Architecture: Layered Resilience Here's the system I designed: ``` Browser (Next.js/React) ↓ GraphQL API (Apollo Server) ↓ APOD Service Layer ├──→ NASA API (primary, with retries + timeout) └──→ HTML Scraping Fallback (when API fails) ↓ Redis Rate Limiter (atomic Lua scripts) ↓ MongoDB (cache successful responses) ``` ### Key Architectural Decisions (3 of them!) **1. GraphQL Shield for Authorization** - `getTodaysApod` is public (no login) - `getApodByDate` requires authentication (prevents abuse) **2. Modular Service Design** ``` src/services/apod/ ├── index.ts # Barrel export ├── apod.service.ts # Orchestrator (API + fallback) ├── apod.api.ts # NASA API client ├── apod.fallback.ts # HTML scraping fallback ├── apod.errors.ts # Typed error codes ├── apod.types.ts # Zod schemas, TypeScript types └── apod.constants.ts # URLs, timeouts, retry config ``` **3. Shared Error Handling Infrastructure** Instead of copy-pasting try/catch blocks across every resolver (we've all been there), I built a reusable error handler: ```typescript // src/utils/errors/graphqlErrors.ts export function createErrorHandler<TCode, TError>( mapErrorCode: (code: TCode) => ErrorCode, isServiceError: (error: unknown) => error is TError, defaultMessage: string ) { return function withErrorHandling<T>( fn: () => Promise<T>, operationName: string ): Promise<T> } ``` Now any service can use it: ```typescript // APOD resolver (34 lines total) export const ApodQueries = { getTodaysApod: async (_, __, context) => withApodErrorHandling( () => fetchApod({ context: { userId: context.user?.id } }), 'getTodaysApod' ), getApodByDate: async (_, args, context) => { if (!context.user) { throw Errors.unauthenticated('Authentication required'); } return withApodErrorHandling( () => fetchApod({ date: args.date, context: { userId: context.user.id } }), 'getApodByDate' ); }, }; ``` --- ## The Journey: 8 Issues, 40+ Commits, 1 Production Feature This didn't work on the first try. Or the fifth. Here's the honest implementation timeline: > **Tracked in:** [Epic v2.4 - APOD Feature](https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD) > [All 40+ commits to (apod) feature](https://github.com/search?q=repo%3Alfariabr%2Fluisfaria.dev+++apod&type=commits&s=committer-date&o=desc) ### Phase 1: Foundation (Issues #61-65) **Frontend: NASA-Branded Floating Action Button** Built `ApodFab.tsx` following the same pattern as the existing `GogginsFab` component: - Circular button with NASA gradient border (`linear-gradient(135deg, #0B3D91, #FC3D21, #1E90FF)`) - Rocket icon with blue pulse aura effect - Radix UI tooltip: "Astronomy Picture of the Day" - Accessible (ARIA labels, keyboard navigation) - Light/dark mode support **Frontend: APOD Dialog Component** Created `ApodDialog.tsx` with: - Date display with calendar icon - Image/video player (handles both media types) - Copyright attribution - External link to NASA APOD website - "Powered by NASA Open APIs" footer **Backend: Configuration & Validation** Set up NASA API credentials: ```typescript // backend/src/config/config.ts interface Config { nasaApiKey: string; } const requiredEnvVars = ['NASA_API_KEY', ...]; ``` Server refuses to start without `NASA_API_KEY` — fail fast, no silent surprises. --- ### Phase 2: NASA API Client (Issue #66) **Zod Schema for Runtime Validation** NASA's API returns JSON, but not all fields are guaranteed: ```typescript // src/validation/schemas/apod.schema.ts export const apodResponseSchema = z.object({ copyright: z.string().optional(), date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/), explanation: z.string().min(1), media_type: z.enum(['image', 'video', 'other']), // 'other' was missing initially! title: z.string().min(1), url: z.string().url().optional(), // Not provided for media_type: "other" hdurl: z.string().url().optional(), apod_url: z.string().url().optional(), // Computed field }); export type ApodResponse = z.infer<typeof apodResponseSchema>; ``` **NASA API Service with Retries** Built `apod.api.ts` with: - Exponential backoff retries (3 attempts) - 8-second timeout per request - AbortController for proper cleanup - Structured logging (latency, status code, userId) ```typescript export async function fetchApodFromApi( url: string, context?: ApodRequestContext ): Promise<ApodResponse> { const startTime = Date.now(); const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), TIMEOUT_MS); try { const response = await fetch(url, { signal: controller.signal, headers: { 'User-Agent': 'luisfaria.dev/1.0' }, }); if (!response.ok) { throw new ApodServiceError( `NASA API error: ${response.status}`, response.status === 429 ? 'RATE_LIMITED' : 'NASA_API_ERROR', response.status ); } const data = await response.json(); const validated = apodResponseSchema.parse(data); logger.info('NASA API request successful', { latencyMs: Date.now() - startTime, date: validated.date, userId: context?.userId, }); return validated; } catch (error) { // Error mapping logic... } finally { clearTimeout(timeoutId); } } ``` --- ### Phase 3: The Hard Part — Failures & Fallbacks (***aka scars earned***) This is where production engineering got real. Here's every bug I hit: | # | Problem | Root Cause | Solution | |---|---------|------------|----------| | 1 | **Validation failures on `media_type: "other"`** | Zod schema only accepted `'image' \| 'video'` | Added `'other'` to enum, made `url` optional | | 2 | **504 Gateway Timeout from NASA** | NASA API occasionally unresponsive | Implemented HTML scraping fallback | | 3 | **`url` field missing for interactive content** | NASA doesn't provide `url` for SDO videos/embeds | Added `apod_url` (computed from date) as fallback | | 4 | **Resolver error handling duplication** | Try/catch boilerplate in every resolver | Extracted shared `createErrorHandler()` utility | | 5 | **Inconsistent error codes between services** | Each service used different error mapping | Created `ErrorCodes` constant as single source of truth | | 6 | **Rate limit bypass by unauthenticated users** | Anonymous users shared the same Redis key | Switched to session-based rate limiting for anonymous users | | 7 | **Tests breaking after modular refactor** | Tests imported from old monolithic `apod.ts` | Rewrote mocks to match new module structure | | 8 | **NGINX 502 after deploying APOD feature** | Container DNS caching after recreation | Added `nginx -s reload` to CI/CD pipeline | **Bug #2 was the game-changer.** When NASA's API returned 504, users saw blank screens. Not acceptable. The fix: automatic HTML scraping fallback — if the API is down, scrape the website directly. --- ### Phase 4: HTML Scraping Fallback (Issue #78) When the NASA API fails, the service automatically scrapes the official APOD website. Users never know the difference: ```typescript // src/services/apod/apod.fallback.ts export async function fetchApodHtmlFallback( date?: string ): Promise<ApodResponse> { const url = date ? `https://apod.nasa.gov/apod/ap${formatDateForApodUrl(date)}.html` : 'https://apod.nasa.gov/apod/astropix.html'; const html = await fetch(url).then(res => res.text()); const $ = cheerio.load(html); // Parse structured data const title = $('center:first b:first').text().trim(); const explanation = $('center:first p:last').text().trim(); const imageUrl = $('center:first img').attr('src'); return { date: date || new Date().toISOString().split('T')[0], title, explanation, url: imageUrl, media_type: 'image', apod_url: url, // ... rest of fields }; } ``` **Orchestration in `apod.service.ts`:** ```typescript export async function fetchApod(options = {}): Promise<ApodResponse> { try { return await fetchApodFromApi(buildApiUrl(options), options.context); } catch (error) { if (shouldFallback(error)) { logger.warn('NASA API failed, falling back to HTML scraping', { error }); return await fetchApodHtmlFallback(options.date); } throw error; } } ``` Users never see errors — they just get the APOD, regardless of which method worked. That's the whole point. --- ### Phase 5: Shared Error Handling Infrastructure (Issue #79) **Before refactor:** Each resolver had 30+ lines of try/catch boilerplate. Copy-paste engineering at its worst. **After refactor:** - Created `src/utils/errors/graphqlErrors.ts` with reusable utilities - Error factories for common cases: `Errors.unauthenticated()`, `Errors.forbidden()`, `Errors.notFound()` - Generic `createErrorHandler()` wrapper generator - Service-specific error mappers (e.g., `withApodErrorHandling`) **Impact:** - Resolvers went from 103 lines to 34 lines - Single place to add new error codes - Error mapping lives with service logic (where it belongs) - Other features can reuse the same pattern — and they already do --- ## Key Engineering Lessons Five production-grade patterns I learned (the hard way) from building APOD: ### 1. Always Have a Fallback External APIs fail. Network timeouts happen. DNS breaks. If your feature depends on a third-party service, you need a backup plan — full stop: - **Primary:** NASA JSON API (fast, structured) - **Fallback:** HTML scraping (slower, but always works) - **User experience:** Seamless — they never know which method was used ### 2. Validate External Data at Runtime TypeScript types don't protect you against API changes. NASA's schema evolved mid-development — they added `media_type: "other"` for interactive content, which broke my Zod schema mid-sprint. **Solution:** Runtime validation with Zod catches schema drift before it crashes the server. ```typescript const validated = apodResponseSchema.parse(data); // Throws if schema mismatch ``` ### 3. DRY Principle for Error Handling Don't duplicate try/catch blocks across resolvers. We've all done it. It's technical debt from day one. Extract shared error handling into reusable utilities: ```typescript // Before: 30 lines of boilerplate per resolver // After: 3 lines + shared error handler return withApodErrorHandling( () => fetchApod({ date: args.date, context }), 'getApodByDate' ); ``` ### 4. Modular Services Are Testable Services Splitting the monolithic `apod.ts` into focused modules made testing trivial — and debugging even more so: ``` src/services/apod/ ├── apod.service.ts # Orchestration (API + fallback) ├── apod.api.ts # NASA API client ├── apod.fallback.ts # HTML scraping ├── apod.errors.ts # Typed errors ├── apod.types.ts # Zod schemas └── apod.constants.ts # Config ``` Each module has a single responsibility. Tests mock at the module boundary, not the entire service. ### 5. Log Everything for Observability Every NASA API request logs: - Latency (`latencyMs`) - User context (`userId`) - Success/failure status - Error codes and details When bugs happen in production (and they will), structured logs are your debugging lifeline. ```typescript logger.info('NASA API request successful', { latencyMs: 142, date: '2026-02-18', userId: 'user_xyz', }); ``` --- ## Results | Metric | Implementation | |--------|---------------| | **Uptime** | 99.9% (fallback handles NASA API downtime) | | **Response time** | <500ms (NASA API), ~1.2s (HTML fallback) | | **Error rate** | 0.1% (network failures only, auto-recovered) | | **Rate limit protection** | 5 req/hr per user (Redis atomic counters) | | **Test coverage** | 94% (28 passing unit tests) | | **Lines of code** | 1,200 (including tests) | | **GraphQL queries** | 2 (`getTodaysApod`, `getApodByDate`) | | **Fallback success rate** | 100% (HTML scraping never failed in production) | ### Real-World Reliability During a 72-hour period where NASA's API had intermittent 504 errors: - **Primary API success rate:** 78% - **Fallback activation:** 22% - **User-facing errors:** 0% Users never knew NASA's API was struggling. The fallback handled it seamlessly — that's the whole point of building resilient systems. --- ## Tech Stack | Layer | Technology | Purpose | |-------|------------|---------| | **Frontend** | Next.js 16 + React 19 | UI with floating action button + dialog | | **UI Library** | Radix UI + TailwindCSS 4 | Accessible components, NASA branding | | **Backend** | Node.js + Express + Apollo Server 5 | GraphQL API | | **Schema** | GraphQL + GraphQL Shield | Type-safe API with field-level authorization | | **Validation** | Zod | Runtime schema validation | | **API Client** | Fetch API + AbortController | HTTP with timeouts and retries | | **Scraping** | Cheerio | HTML parsing for fallback | | **Rate Limiting** | Redis + Lua scripts | Atomic counters per user | | **Database** | MongoDB | Cache successful APOD responses | | **Logging** | Winston | Structured logs for observability | | **Testing** | Jest + ts-jest | Unit tests with mocked services | --- ## Future Roadmap The current implementation is production-ready, but there's always room to grow. Here are 5 ideas — feel free to add yours in the comments! ### Idea #1: Database Caching Layer Right now, every request hits NASA's API (or HTML fallback). Next iteration: - Cache successful responses in MongoDB - Return cached APOD if date already fetched - Reduce API quota usage by 80% - Instant response for popular dates ### Idea #2: Admin Dashboard GraphQL mutations to manually refresh/delete cached APODs: ```graphql mutation RefreshApod($date: String!) { refreshApod(date: $date) { date, title } } ``` ### Idea #3: WebSocket Push Updates Use GraphQL subscriptions to push new APODs to connected clients when they become available at midnight UTC. ### Idea #4: Zero-Cold-Start: Daily Cron + Redis 24h Cache Right now, the first user of the day triggers a live NASA API call. That's ~200-500ms of cold latency — acceptable, but not great. The plan: a daily cron job fires at **00:01 UTC**, fetches today's APOD proactively, and stores it in **Redis with a 24h TTL**. Every subsequent request that day gets a cache hit — sub-10ms response, zero external calls. ```typescript // Pseudocode: src/jobs/apodDaily.ts export async function warmApodCache() { const today = new Date().toISOString().split('T')[0]; const cacheKey = `apod:${today}`; // Already warm? Skip. const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // Fetch fresh from NASA const apod = await fetchApod({ date: today }); // Cache for exactly 24h (expires at midnight UTC) const secondsUntilMidnight = getSecondsUntilMidnightUTC(); await redis.setex(cacheKey, secondsUntilMidnight, JSON.stringify(apod)); logger.info('APOD cache warmed', { date: today, ttl: secondsUntilMidnight }); return apod; } ``` The cron schedule via `node-cron`: ```typescript // Fires at 00:01 UTC every day cron.schedule('1 0 * * *', warmApodCache, { timezone: 'UTC' }); ``` The resolver then checks Redis first before ever hitting NASA: ```typescript getTodaysApod: async (_, __, context) => { const today = new Date().toISOString().split('T')[0]; const cacheKey = `apod:${today}`; const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // ⚡ <10ms return withApodErrorHandling( // 🐌 200-500ms () => fetchApod({ context: { userId: context.user?.id } }), 'getTodaysApod' ); }, ``` **Expected impact:** | Scenario | Before | After | |----------|--------|-------| | First request of the day | ~300ms (live NASA call) | ~5ms (Redis hit) | | Subsequent requests | ~300ms (live NASA call) | ~5ms (Redis hit) | | NASA API unavailable | ~1.2s (HTML fallback) | ~5ms (Redis hit) | | NASA quota usage | 1 req per user visit | 1 req per day total | The key insight: Redis TTL auto-expires the cache exactly when it stops being valid. No manual invalidation. No stale data. Just *fast* for 99% of requests. ### Idea #5: Analytics Dashboard Track: - Most popular APOD dates - Fallback usage percentage - Average response time (API vs. fallback) - Rate limit triggers per user --- ## Key Takeaways Building production-grade API integrations is 20% "get it working" and 80% "handle when it doesn't work." Five principles that made APOD production-ready: 1. **Graceful degradation** — Fallbacks ensure users never see errors 2. **Runtime validation** — Zod catches schema drift before it crashes 3. **Modular architecture** — Focused modules are easier to test and maintain 4. **Shared error handling** — DRY principle for GraphQL resolvers 5. **Observability** — Structured logs make debugging trivial --- ## Try It Yourself The full APOD implementation is open source: | Resource | Link | |----------|------| | **Live Demo** | [luisfaria.dev](https://luisfaria.dev) — Click the NASA rocket button | | **GitHub Repo** | [github.com/lfariabr/luisfaria.dev](https://github.com/lfariabr/luisfaria.dev) | | **APOD Service** | [backend/src/services/apod/](https://github.com/lfariabr/luisfaria.dev/tree/master/backend/src/services/apod) | | **GraphQL Schema** | [backend/src/schemas/types/apodTypes.ts](https://github.com/lfariabr/luisfaria.dev/blob/master/backend/src/schemas/types/apodTypes.ts) | | **Frontend Component** | [frontend/src/components/apod/](https://github.com/lfariabr/luisfaria.dev/tree/master/frontend/src/components/apod) | | **Feature Spec** | [_docs/featureBreakdown/v2.4.Apod.MD](https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD) | --- ## Let's Connect! Building this NASA integration taught me more about production engineering than any tutorial could. Every failure mode I hit — 504 timeouts, schema drift, rate limits, DNS caching — is something I'll face again in enterprise systems. And now I know how to handle it. If you're working with: - GraphQL APIs and error handling patterns - Third-party API integrations with fallback strategies - Next.js + Node.js full-stack applications - Production-grade TypeScript architectures I'd love to connect and trade war stories: - **LinkedIn:** [linkedin.com/in/lfariabr](https://www.linkedin.com/in/lfariabr/) - **GitHub:** [github.com/lfariabr](https://github.com/lfariabr) - **Portfolio:** [luisfaria.dev](https://luisfaria.dev) --- **Tech Stack Summary:** | Current Implementation | Future Extensions | |----------------------|----------------------| | NASA API + HTML fallback, GraphQL Shield, Redis rate limiting, Zod validation, modular services, Winston logging, 94% test coverage | Redis 24h cache + daily cron warm-up, GraphQL subscriptions, admin mutations, analytics dashboard | --- *Built with ☕, 40+ commits, and a healthy fear of blank screens by [Luis Faria](https://luisfaria.dev)* > *Whether it's concrete or code, structure is everything.*

nasagraphqlnode.jsreacttypescriptnextjsapierrorhandling

20/02/2026

GitHub View Details

Project

From Excel to Interactive Business Insights with Python & Streamlit

**How I turned a multi-year building invoice ledger into an interactive analytics dashboard — and why it changed how I think about operations, data, and engineering.** > *"The best code is the code that quietly removes friction from people's work."* --- ## 🏢 Context: Assistant Building Manager, Real Data, Real Stakes Over a six-week stretch, I was working as an **Assistant Building Manager** at a large residential building in the south of Sydney, closely shadowing an experienced Building Manager with 25+ years across construction, water systems, and large-scale facilities operations. Alongside day-to-day operations, I also built small internal tools — like a [Lift Finder](https://dev.to/lfariaus/engineering-principles-applied-to-daily-life-concierge-edition-1cjh) utility and [myRoster](https://dev.to/lfariaus/myroster-from-copypaste-to-2-minute-submissions-dao) (a shift automation app) — whenever I noticed repetitive friction in the workflow. This role exposed me to the **full operational lifecycle** of a high-rise building: * **Stakeholder management**: Owners Corporation, committee members, residents, strata, contractors * **Maintenance workflows**: diagnosis → contractor selection → approval → execution → validation * **Compliance & regulation**: AFSS, fire services, inspections, reporting * **Financial reality**: invoices, budgets, approvals, recurring vs reactive spend And obviously — massive amounts of **data**. Around the same time, I accepted a **Data Analyst** role at **St Catherine’s School** ([Read more](https://dev.to/lfariaus/learning-sql-server-the-hard-way-16-days-of-real-world-database-work-5hla)), which reinforced the same mindset: treat operational noise as structured data waiting to be explored. Every single decision eventually traced back to one place. --- ## 📁 The Starting Point: An Excel Invoice Ledger Inside the building's shared drive (*S://BuildingName/Finances/Invoices*) lived an unassuming file: * A multi-sheet **invoice ledger** * Spanning **4+ years** * Thousands of rows * Dozens of contractors * Hundreds of services * GST, dates, approvals, variations, reworks ![Microsoft Excel File](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/irb0oqb8zf3qm2nfo4yr.png) On paper, it was "just Excel." In reality, it was: > **The financial memory of the building.** Every question led back to it: * *How much are we spending on fire services?* * *Is this contractor consistently expensive or just a one-off?* * *Why did costs spike mid-2023?* * *Are we reacting to problems or investing preventatively?* --- ## ⚠️ The Problem: Excel Doesn't Scale with Questions ### Why Excel Became the Bottleneck | Excel Reality | Building Management Reality | |---------------|----------------------------| | Manual filters | Questions come fast | | Pivot tables break | Context changes constantly | | One question at a time | Multiple stakeholders need answers | | 10-minute turnaround | Decisions need justification *now* | | Version control chaos | Audit trail required | **The typical workflow:** 1. Open Excel (wait for thousands of rows to load...) 2. Navigate to the right sheet (Building A? B? C?) 3. Apply filters (Year... Contractor... Service...) 4. Create pivot table (if you remember how) 5. Screenshot or copy-paste results 6. **Repeat** for the next question 5 minutes later This wasn't analysis. It was **manual overhead**. And in building management, manual overhead means: - Slower contractor evaluations - Delayed budget approvals - Missed spending patterns - Reactive instead of preventative decisions --- ## 🎓 The Engineering Lens: Treating Excel as a Dataset At the same time, I'm pursuing a **Master's in Software Engineering & Artificial Intelligence** (*see my [open-source repo](https://github.com/lfariabr/masters-swe-ai)*) — so my instinct kicked in: > This isn't an Excel problem. > This is a **data exploration problem**. ✅ The ledger already had: * **Time-series data** (4+ years of invoices) * **Categorical dimensions** (building, contractor, service) * **Natural aggregations** (monthly spend, contractor totals) * **Long-term trends** (seasonal patterns, cost escalation) * **Outliers that matter financially** (unexpected spikes, recurring issues) The data was **already structured**. Microsoft Excel was just the **wrong interface** for exploration. So I built a tool in Python that lets **non-technical users explore it safely**. --- ## 🛠️ The Solution The goal was simple: turn a static spreadsheet into a safe, visual, self-service analytics tool for non-technical users. I built an **interactive analytics dashboard** using **Python + Pandas + Streamlit** to read from the `ledger.xlsx` file. ![Streamlit User Interface with loaded data](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sg3gnzjy1nv8oca6tiqr.png) In minutes, I could answer questions that used to take 10–15 minutes of Excel wrestling — and export the evidence for emails, audits, or committee meetings. ### **What It Does** **Upload** a raw `.xlsx` invoice ledger → Instantly: * 🏢 **Filter by building** (or view "All" for consolidated insights) * 📅 **Filter by year(s)** (multi-select: 2023 + 2024) * 👷 **Filter by contractor** (compare spending across vendors) * 🔧 **Filter by service** (HVAC vs. Plumbing vs. Fire Services) * 🔍 **Search by invoice number** (quick lookups) * 📆 **Date range picker** (Q3 analysis, seasonal trends) * 💰 **Amount range slider** (focus on high-value invoices) **Auto-compute:** * Total spend (GST inc.) * Invoice count * Unique contractors * Service diversity **Visualize:** * 📊 **Contractor spend breakdown** (bar chart + color-coded heatmap) * 📈 **Monthly expense timeline** (spot trends, anomalies) * 🎨 **Cost concentration** (which contractors dominate spend?) * 🔄 **Multi-year comparisons** (year-over-year changes) **Export:** * 📥 **Download filtered results as CSV** (for reports, audits, approvals) **No pivot tables.** **No broken formulas.** **No "give me 10 minutes to check."** --- ## 🏗️ Tech Stack & Architecture The app follows **clean software engineering principles** — modular, maintainable, production-ready. ### **Technology Choices** | Layer | Technology | Why | |-------|------------|-----| | **Language** | Python 3.10+ | Standard for data + automation | | **Web Framework** | [Streamlit](https://streamlit.io) | Rapid UI development, zero JavaScript | | **Data Processing** | [Pandas](https://pandas.pydata.org/) | Industry-standard DataFrames | | **Excel Integration** | [openpyxl](https://openpyxl.readthedocs.io/) | Multi-sheet Excel parsing | | **Visualization** | Streamlit charts + Pandas styling | Built-in, no external dependencies | | **Deployment** | Streamlit Cloud | Free hosting, GitHub integration | ### **Project Structure** ``` invoice-ledger/ ├── app.py # Main UI orchestration ├── data_loader.py # Excel parsing & data cleaning ├── filters.py # Interactive filter components ├── analytics.py # Metrics, charts, visualizations └── requirements.txt # Dependencies ``` **Why modular?** - ✅ **Single Responsibility** — Each file does one thing well - ✅ **Testable** — Unit test each component independently - ✅ **Maintainable** — Know exactly where to make changes - ✅ **Reusable** — Port components to other PropTech projects - ✅ **Readable** — Onboard new devs in minutes, not hours --- > 🔍 Full module-by-module breakdown available here → [docs/ARCHITECTURE.md](https://github.com/lfariabr/invoice-ledger/tree/main/docs/ARCHITECTURE.md) --- ## 📊 The Impact: Before vs. After | Metric | Before (Excel) | After (Dashboard) | Improvement | |--------|----------------|-------------------|-------------| | **Query Time** | 10-15 minutes | ~2 minutes | **80% faster** | | **Multi-building Analysis** | Open 3 files manually | Single "All" view | **3x faster** | | **Visualizations** | Manual pivot tables | Auto-generated charts | **100% automated** | | **Reproducibility** | "How did I filter this again?" | Click filters → Export CSV | **100% consistent** | | **Contractor Comparison** | Side-by-side spreadsheets | Color-coded heatmap | **Instant insights** | | **Trend Analysis** | Copy-paste into separate tool | Built-in timeline chart | **Native support** | | **User Training** | "Here's how Excel works..." | "Upload and click" | **Zero onboarding** | --- ## 🎯 Real-World Use Cases ### **1. Contractor Performance Review** **Question:** *"How much did we spend with ABC Plumbing across all buildings in 2024?"* **Old way:** - Open 3 Excel files (Building A, B, C) - Filter each by contractor - Sum manually - 5 minutes **New way:** - Select "All buildings" - Filter contractor: "ABC Plumbing" - Filter year: "2024" - **Answer in 30 seconds** The result isn’t just faster — it’s **far more presentable**, making it suitable for committee meetings, audits, and stakeholder discussions. --- ### **2. Budget Planning** **Question:** *"What's our average monthly HVAC spending?"* **Old way:** - Filter by service - Create pivot table by month - Calculate average - Hope you didn't break formulas - 10 minutes **New way:** - Filter service: "HVAC" - View monthly timeline chart - **Answer visible immediately** --- ### **3. Audit Trail for Committee** **Question:** *"Show me all fire services invoices over $5,000 from Q4 2024"* **Old way:** - Filter by service - Filter by date range - Filter by amount - Screenshot or print - 12 minutes **New way:** - Apply 3 filters - Click "Download CSV" - Attach to email - **Answer + deliverable in 2 minutes** --- ### **4. Anomaly Detection** **Question:** *"Why was November 2023 spending so high?"* **Old way:** - Create pivot table by month - Spot the spike - Filter November 2023 - Manually inspect rows - 15 minutes **New way:** - View monthly timeline chart (spike visible instantly) - Filter date range: November 2023 - Heatmap shows which contractor(s) caused it - **Root cause in 3 minutes** --- ## Fun Fact **Built in 1 day** as a side project during my working hours. **Origin story:** Started in the `southB/` directory of my [masters-swe-ai repo](https://github.com/lfariabr/masters-swe-ai/tree/main/2025-T2/T2-Extra/southB) as a quick experiment. When I realized how useful it was, I: 1. Cleaned up the code 2. Made it modular 3. Created standalone repo 4. Wrote comprehensive documentation 5. Deployed publicly --- ## 🔗 Links & Resources | Resource | Link | |----------|------| | **GitHub Repo** | [github.com/lfariabr/invoice-ledger](https://github.com/lfariabr/invoice-ledger) | | **Source Code (southB origin)** | [masters-swe-ai/southB](https://github.com/lfariabr/masters-swe-ai/tree/main/2025-T2/T2-Extra/southB) | | **Live Demo** | [streamlit app](https://invoice-ledger.streamlit.app/) | | **Excel Template (fake data)** | [download & explore the data safely - *fake data*](https://github.com/lfariabr/invoice-ledger/raw/main/data/invoiceLedger.xlsx) | --- ## 🚀 Future Roadmap: From Dashboard to PropTech Platform While the current version solves the immediate problem, here's the possible expansion plan: ### **1. Database Backend (PostgreSQL/Supabase)** **Current:** Upload Excel each time **Future:** Persistent database with incremental updates **Benefits:** - Historical version control - Audit trail (who queried what, when) - Multi-user access with authentication - API for integration with other building systems --- ### **2. Predictive Analytics (ML)** **Use cases:** - *"Based on 4 years of data, predict next quarter's HVAC spending"* - *"Which contractors are trending expensive year-over-year?"* - *"Seasonal patterns: fire services spike in winter?"* **Technical approach:** - Time-series forecasting (Prophet) - Contractor spending clustering - Anomaly detection for unusual invoices --- ### **3. Automated Reporting** **What it does:** Schedule weekly/monthly reports via email **Example workflows:** - Every Monday: Summary of last week's spending - End of month: PDF report with charts for Owners Corporation - Budget alerts: Email if spending exceeds threshold --- ### **4. Integration with Building Management Systems** **Current:** Standalone dashboard **Future:** Connect to existing PropTech stack **Integrations:** - **AFSS systems** — Auto-import fire inspection costs - **Strata software** — Sync budget approvals - **Contractor portals** — Pull invoices directly - **Power BI** — Feed data to enterprise dashboards --- ## Let's Connect! Building Invoice Ledger Analytics was a perfect case for me to **turn operational friction into engineering opportunity**. If you're: - Working in **PropTech** or building management - Building internal tools for **finance** or **operations** - Interested in **Python automation** and **data visualization** - Looking for practical **Streamlit** examples - Hiring for **backend/data/PropTech** roles I'd love to connect: - **LinkedIn:** [linkedin.com/in/lfariabr](https://www.linkedin.com/in/lfariabr/) - **GitHub:** [github.com/lfariabr](https://github.com/lfariabr) - **Portfolio:** [luisfaria.dev](https://luisfaria.dev) --- **Tech Stack Summary:** | Current | Future Extensions | |---------|-------------------| | Python, Streamlit, Pandas, openpyxl | PostgreSQL/Supabase, ML (Prophet/LangChain), Building System APIs (AFSS, Strata), React Native/PWA | --- *Built with ☕ and firsthand building management experience* *"The best code is the code that quietly removes friction from people's work."*

PythonStreamlitData EngineeringAutomationProp Tech

03/02/2026

GitHub View Details

Project

myRoster: from copypaste to 2-minute submissions

**From tedious spreadsheet rituals to 2-minute submissions: how I turned a workplace pain point into a productivity multiplier.** > *"The best automation isn't flashy — it's invisible. It just works."* --- ## 🎯 The Challenge: ### When Spreadsheets Become a Time Sink If you've ever worked in shift-based operations, you know the drill. Every roster cycle, the same tedious routine: open a spreadsheet, manually tick boxes for every single day you're available, triple-check you didn't miss anything, export it, draft an email, attach the file, and finally hit send. Rinse and repeat, week after week. ![Email asking for availability](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mdbgehalj1cfzmzdnjsr.png) For one HR team I've met, this process was eating up valuable time that could have been spent on actual work: | Pain Point | Impact | |------------|--------| | **Manual entry** | 15-20 minutes per roster cycle per employee | | **Inconsistent formats** | HR receives varied submissions, coordination nightmare | | **Error-prone** | Missed dates, wrong shifts, duplicate entries | | **Soul-crushing** | Nobody looks forward to roster week | I saw this inefficiency firsthand and thought: *There has to be a better way.* > **Spoiler:** There was. --- ## 🤖 The Solution: ### _myRoster_: Automation Meets Simplicity That's when **myRoster** was born: A lightweight and intuitive web application that transforms shift availability submission from a chore into a 2-minute task. ![myRoster Web App](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/u678436flo6v69l7nbp8.png) ### **How It Works** myRoster is built as a **Streamlit-powered** web app that runs entirely in the browser. No complex installations, no training sessions—just open the link and you're ready to go. Here's what makes it tick: **1. Smart Roster Period Calculation** The app automatically calculates the next roster cycle based on HR's scheduling logic. No more guessing which dates to fill out—the system knows exactly what period you're submitting for, starting from the Monday three weeks ahead and spanning a full 4-week cycle. **2. Interactive Spreadsheet Interface** Instead of static forms, users interact with a familiar spreadsheet-like grid. Each week is organized in collapsible sections, showing dates, days of the week, and three shift columns (7am-3pm, 3pm-11pm, 11pm-7am). Just click the checkboxes for your available shifts—no hunting through dropdowns or typing dates manually. **3. One-Click Weekly Shortcuts** Need to mark yourself available for all morning shifts in a week? One button. Want to clear an entire week? Another click. These shortcuts eliminate repetitive clicking, cutting entry time by more than half. **4. Real-Time Progress Tracking** As you make selections, myRoster instantly updates your coverage statistics—showing total shifts selected, number of days covered, and a visual progress bar. You know exactly where you stand before submitting. **5. One-Click Submission** Hit "Preview & Submit," and myRoster generates a clean CSV file, automatically emails it to HR with a professional HTML template, and optionally sends you a copy. The entire process—from opening the app to hitting send—takes under 2 minutes. ### Tech Stack I kept the technology intentionally lean: | Layer | Technology | Purpose | |-------|------------|---------| | **Backend** | Python 3.10+ | Core logic, date calculations | | **Frontend** | Streamlit | Interactive web UI, zero JS needed | | **Data** | Pandas | Shift matrices, CSV export | | **Email** | Gmail SMTP (GCP) | Automated delivery | | **Deployment** | Streamlit Cloud | One-click deploy from GitHub | **Project Structure:** ``` myRoster/ ├── app.py # Main Streamlit entry point ├── views/ │ └── rosterView.py # UI components ├── helpers/ │ └── roster.py # Date calculations └── services/ └── email.py # Email automation ``` The modular architecture makes it easy to extend features or adapt for different scheduling needs. --- ## The Impact: Time Saved, Efficiency Gained The results speak for themselves: | Metric | Before | After | |--------|--------|-------| | **Submission time** | 15-20 minutes | ~2 minutes | | **Format consistency** | Varies by employee | 100% standardized | | **Error rate** | Frequent | Zero | | **Employee satisfaction** | Dreaded task | Quick and painless | --- ## Future Roadmap: From MVP to Platform While myRoster already delivers significant value in its current form, there's immense potential to evolve it from a standalone tool into a comprehensive workforce management platform. Here's what I've mapped out: ### **1. Multi-Provider Email Infrastructure** **Current state:** Relies solely on Gmail SMTP via Google Cloud Platform **Next iteration:** Integration with [Resend](https://resend.com) for more reliable transactional email delivery **Why this matters:** - **Automated reminders**: Schedule notifications 48 hours before roster deadlines - **Smart alerts**: Notify HR when submissions are incomplete or coverage is below threshold - **Employee confirmations**: Send automatic receipts when availability is successfully submitted - **Higher deliverability**: Resend offers better inbox placement and detailed analytics compared to SMTP This would transform myRoster from a submission tool into an active communication hub that keeps everyone informed and on track. --- ### **2. Robust Backend with Supabase** **Current limitation:** No persistent user data, authentication, or preferences **Next evolution:** Full-stack upgrade with Supabase as the backend **Features unlocked:** - **Authentication**: Secure login with email/password or SSO via EmploymentHero - **User profiles**: Save preferred shifts, notification settings, and contact preferences - **Historical data**: View past submissions, track coverage trends over time - **Saved drafts**: Start filling out availability, save progress, and return later - **Admin dashboard**: HR users get real-time coverage analytics, submission status tracking, and bulk operations - **Role-based access control**: Employees, HR, and managers see different views and capabilities **Why Supabase?** - PostgreSQL database with real-time subscriptions (perfect for live coverage updates) - Built-in authentication and row-level security - RESTful and GraphQL APIs out of the box - Integrates seamlessly with Python backends - Free tier suitable for MVP, scales affordably **Migration path:** Current CSV-based workflow becomes a fallback option while Supabase gradually handles user data, preferences, and analytics storage. --- ### **3. Machine Learning #1: Pattern Recognition & Predictive Scheduling** **What it does:** Analyze historical availability data to identify patterns in employee behavior, building coverage needs, and seasonal trends. **Use cases:** - **Coverage prediction**: "Based on historical data, Building A typically has low evening shift coverage in December. Flag this 3 weeks in advance." - **Employee behavior insights**: "User X consistently submits availability on the last day—send them an early reminder." - **Building-specific trends**: "Building B requires 15% more morning shifts during summer months—adjust recommendations accordingly." - **Anomaly detection**: Flag unusual submission patterns that might indicate scheduling conflicts or errors **Technical approach:** - Time-series analysis using scikit-learn or Prophet - Clustering algorithms to group similar availability patterns - Lightweight models that can run serverless (no heavy infrastructure needed) **Real-world impact:** HR teams can proactively address coverage gaps before they become emergencies, and employees get personalized nudges based on their actual behavior patterns. --- ### **4. Machine Learning #2: RAG-Powered Knowledge Base** **Inspired by:** [AI Engineering na Prática: Construindo RAG com Neural Networks](https://newsletter.nagringa.dev/p/ai-engineering-na-pratica-construindo) **What it does:** Build a conversational AI assistant powered by Retrieval-Augmented Generation (RAG) that understands roster policies, shift rules, and employee FAQs. **Employee experience:** - *"Which shifts do I need to fill for Christmas week?"* → AI retrieves company holiday policies + roster dates and provides personalized guidance - *"What happens if I can't work my scheduled shift?"* → AI surfaces shift swap procedures, contact info, and deadline policies - *"Show me my availability history for Q4 2025"* → AI queries the database and presents formatted historical data **HR experience:** - Automated responses to repetitive questions - Instant access to shift coverage analytics via natural language queries - Policy enforcement reminders embedded in the chat experience **Technical stack:** - **Vector database** (Pinecone, Weaviate, or Supabase pgvector) for document embeddings - **LLM integration** (OpenAI GPT-4, Claude, or open-source alternatives like Llama) - **RAG framework** (LangChain or LlamaIndex) for retrieval logic - **Knowledge base**: Company policies, shift rules, historical data, and FAQs **Why this is powerful:** Instead of just automating form submission, myRoster becomes an intelligent assistant that *understands* the nuances of scheduling, reduces HR support burden, and makes policy information instantly accessible. --- ### **5. EmploymentHero API Integration** **Current pain point:** Employees submit via myRoster → HR manually copies CSV data into EmploymentHero **Automated future:** Direct API integration eliminates manual data entry entirely **How it works:** 1. Employee submits availability in myRoster 2. System authenticates via EmploymentHero API 3. Availability data is automatically synced to the employee's EH profile 4. HR sees updated availability directly in their scheduling dashboard—no CSV, no copy-paste, no errors **Additional benefits:** - **Bi-directional sync**: Pull existing shift schedules from EH into myRoster for reference - **Conflict detection**: Cross-reference submitted availability against existing scheduled shifts - **Deeper insights**: Combine myRoster's ML analytics with EH's payroll and attendance data for comprehensive workforce planning - **Single source of truth**: Eliminate data duplication and version control issues **Technical implementation:** EmploymentHero provides a REST API with endpoints for employee data, shift scheduling, and time & attendance. Integration would involve: - OAuth 2.0 authentication - Middleware service to translate myRoster data models into EH-compatible formats - Webhook listeners for real-time updates from EH back to myRoster **Real-world impact:** This closes the loop entirely. What started as "save 15 minutes per employee" becomes "eliminate an entire manual workflow for HR"—potentially saving dozens of hours per roster cycle across the organization. > *Curious about the Timeline? Check my [CHANGELOG](https://github.com/lfariabr/roster/blob/main/docs/CHANGELOG.md) for a detailed breakdown.* --- ## Key Takeaways This project reinforced principles I apply to every build: 1. **Start with the pain point**: Every feature traces back to real user frustration 2. **Ship fast, iterate often**: MVP in days, not months 3. **Boring tech wins**: Streamlit + Pandas = production-ready in hours 4. **Design for extensibility**: Modular architecture enables future growth 5. **Measure impact**: 90% time reduction is the kind of number that screams ROI --- ## Try It Yourself myRoster is live and open source: | Resource | Link | |----------|------| | **Live Demo** | [myroster.streamlit.app](https://myroster.streamlit.app/) | | **Source Code** | [github.com/lfariabr/roster](https://github.com/lfariabr/roster) | If you're building internal tools or automating workflows, I'd love to hear how you approach similar problems. --- ## Let's Connect! Building myRoster has been a perfect example of turning workplace friction into engineering opportunity. If you're: - Automating internal workflows - Building tools with Streamlit - Passionate about practical productivity solutions - Interested in Python automation I'd love to connect: - **LinkedIn:** [linkedin.com/in/lfariabr](https://www.linkedin.com/in/lfariabr/) - **GitHub:** [github.com/lfariabr](https://github.com/lfariabr) - **Portfolio:** [luisfaria.dev](https://luisfaria.dev) --- **Tech Stack Summary:** | Current | Future | |---------|--------| | Python, Streamlit, Pandas, Gmail SMTP (GCP) | Supabase, Resend, OpenAI/RAG, EmploymentHero API, ML (scikit-learn/Prophet) | --- *Built with ☕ and automation by [Luis Faria](https://luisfaria.dev)*

PythonStreamlitPandasAutomationGCPGoogle AppscriptProductivity

21/01/2026

GitHub View Details

Project

Building EigenAI: Teaching Math Foundations of AI Through Interactive Code

**From determinants to hill climbing algorithms—how I turned academic math into an interactive learning platform.** > *"Whether it's concrete or code, structure is everything."* --- ## 🎓 The Challenge: Making Math "Click" As a self-taught software engineer transitioning from 10+ years in project management, I enrolled in **MFA501 – Mathematical Foundations of Artificial Intelligence** at Torrens University Australia under [Dr. James Vakilian](https://au.linkedin.com/in/james-v-70183b28). The subject covered everything from linear algebra to optimization algorithms—the mathematical backbone of modern AI applications in: - **Machine Learning** (model training, optimization) - **Natural Language Processing** (text embeddings, transformations) - **Computer Vision** (image processing, feature extraction) - **Speech Recognition** (signal processing, pattern matching) But here's the problem: **abstract math doesn't stick unless you build something with it.** So instead of just solving problems on paper, I built **[EigenAI](https://eigen-ai.streamlit.app/)** — an interactive Streamlit app that teaches mathematical concepts through live computation, step-by-step explanations, and real-time visualizations. > **Can we make eigenvalues, gradients, and hill climbing algorithms as intuitive as playing with Legos?** ![Lego Wallpaper](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzx8b9m0691khi325dlo.png) That question drove the entire project. --- ## 🤖 What Is EigenAI? ![EigenAI taking a coffee getting ready to teach](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kit7ysi5jp0fjo7uy72g.png) **EigenAI** (playing on "eigenvalues" and "AI foundations") is a web-based educational platform that implements core mathematical concepts from AI foundations. It's structured around **four assessments** that progressively build complexity, with the app implementing the three case study assessments (2A, 2B, 3): ### **The 12-Week Journey** The subject covered 12 progressive modules: | Week | Topic | Overview | |------|-------|----------| | Weeks 1-5 | Linear Algebra Foundations | Sets, vectors, matrices, transformations, eigenvalues | | Weeks 6-9 | Calculus & Optimization | Derivatives, integrals, hill climbing, simulated annealing, genetic algorithms | | Weeks 10-12 | Probability, Statistics & Logic | Foundations for AI reasoning and decision-making | > ***Note: Module 6 taught by [Dr. Niusha Shafiabady](https://www.niushashafiabady.com/)*** --- ### **Assessment 1: Linear Algebra Fundamentals** *(Online Quiz)* - ✅ Matrix operations (addition, multiplication, transpose) - ✅ Vector operations (magnitude, unit vectors, dot product, cross product) - ✅ Systems of equations (elimination, Gaussian elimination) - ✅ Linear transformations (stretching, reflection, projection) **The Challenge:** 60-minute timed quiz covering Modules 1-2 foundational concepts—no coding, pure mathematical understanding. **Why It Matters:** These fundamentals are the building blocks for understanding how data flows through neural networks and ML algorithms. > *Note: Assessment 1 was a quiz-only assessment. The EigenAI app implements the three case study assessments (2A, 2B, 3) that required coding solutions.* --- ### **Assessment 2A: Determinants & Eigenvalues** *(Case Study)* - ✅ Recursive determinant calculation for n×n matrices - ✅ Eigenvalue and eigenvector computation (2×2 matrices) - ✅ Step-by-step mathematical notation using SymPy - ✅ Input validation and error handling **The Challenge:** Implement cofactor expansion from scratch—no NumPy allowed for core logic, only pure Python. **Why It Matters:** Eigenvalues and eigenvectors are the foundation of: - **PCA (Principal Component Analysis)** — dimensionality reduction for large datasets - **Eigenfaces** — facial recognition algorithms - **Feature compression** — reducing computational cost in ML models Understanding determinants reveals why singular matrices break these algorithms. --- ### **Assessment 2B: Calculus & Neural Networks** *(Case Study)* - ✅ Numerical integration (Trapezoid, Simpson's Rule, Adaptive Simpson) - ✅ RRBF (Recurrent Radial Basis Function) gradient computation - ✅ Manual backpropagation without TensorFlow/PyTorch - ✅ Comparative analysis of integration methods with error bounds **The Challenge:** Compute gradients by hand for a recurrent network—feel the chain rule in your bones. **Why It Matters:** Before using `model.fit()`, you should understand what `.backward()` actually does. --- ### **Assessment 3: AI Optimization Algorithms** *(Case Study)* - ✅ Hill Climbing algorithm for binary image reconstruction - ✅ Stochastic sampling variant (speed vs. accuracy trade-off) - ✅ Pattern complexity selector (simple vs. complex cost landscapes) - ✅ Real-time cost progression visualization **The Challenge:** Reconstruct a 10×10 binary image from random noise using only local search—no global optimization, no backtracking. **Why It Matters:** Hill climbing is the foundation of gradient descent, simulated annealing, and evolutionary algorithms. If you understand local optima here, you understand why neural networks get stuck. > **💡 Key Insight from Module 6 ([Dr. Niusha Shafiabady](https://www.niushashafiabady.com/)):** > > Hill climbing can get stuck in local optima with no guarantee of finding the global optimum. The cure? > - **Random restarts** (try multiple starting points) > - **Random mutations** (introduce noise) > - **Probabilistic acceptance** (simulated annealing) > > This limitation explains why modern AI uses ensemble methods and stochastic optimization. --- ## 🗓️ Project Timeline & Results | Month | Assessment | Status | |-------|------------|--------| | October 2025 | Linear Algebra Quiz | **72.5% (C)** | | October 2025 | Determinants & Eigenvalues | **82% (D)** | | November 2025 | Integrals & RRBF | **84% (D)** | | December 2025 | Hill Climbing | Awaiting results **Total Duration:** 12 weeks of intensive mathematical foundations for AI --- ## 🏗️ Technical Architecture | Layer | Technology | Purpose | |-------|------------|---------| | **Frontend** | Streamlit | Interactive UI with zero JavaScript | | **Core Logic** | Pure Python 3.10+ | Type-hinted, no NumPy in algorithms | | **Math Rendering** | SymPy + matplotlib | LaTeX-quality equations | | **Deployment** | Streamlit Cloud | One-click deploy from GitHub | | **Version Control** | Git + GitHub | Full project history since commit 1 | ### **Why Pure Python for Core Logic?** The assessment required implementing algorithms **without numerical libraries** to demonstrate understanding of the underlying math. This constraint forced me to: - Write cofactor expansion from scratch (not just `np.linalg.det()`) - Implement Simpson's Rule manually (not just `scipy.integrate.quad()`) - Build hill climbing with custom neighbor generation (not just `scipy.optimize.minimize()`) **Result:** Deep understanding of how these algorithms actually work under the hood. --- ## 🗝️ Key Features & Lessons Learned ### **1. Modular Architecture That Scales** ``` eigenai/ ├── app.py # Main Streamlit entry point ├── views/ # UI components (one per assessment) │ ├── set1Problem1.py # Determinants UI │ ├── set1Problem2.py # Eigenvalues UI │ ├── set2Problem1.py # Integration UI │ ├── set2Problem2.py # RRBF UI │ └── set3Problem1.py # Hill Climbing UI └── resolvers/ # Pure Python algorithms ├── determinant.py ├── eigen_solver.py ├── integrals.py ├── rrbf.py ├── hill_climber.py └── constructor.py ``` **Lesson Learned:** Separating algorithm logic from UI made testing 10x easier. When debugging the cost function, the UI stayed unchanged. When improving visualizations, the core math stayed untouched. **Iterative Development:** EigenAI evolved through 23+ versions: | Version | Milestone | |---------|----------| | v0.0.1 | Streamlit setup, assets, pages | | v0.1.0 | ✅ Assessment 2A submission | | v0.1.8 | Added Hill Climbing Binary Image Reconstruction | | v0.2.0 | ✅ Assessment 2B submission (Integration + RRBF) | | v0.2.4 | Added stochastic sampling to Hill Climber | | v0.2.6 | Added complex pattern selector | | v0.3.0 | ✅ Assessment 3 submission (Hill Climbing Algorithm) | > Each assessment pushed the app forward—turning coursework into production-ready features. Detailed [`CHANGELOG.md`](https://github.com/lfariabr/eigenAi/blob/master/docs/changelog.md) --- ### **2. Hill Climbing: When "Good Enough" Is Good Enough** The most fascinating part was implementing **Hill Climbing** for image reconstruction: **The Problem:** - Start with a random 10×10 binary image (noise) - Target: A circle pattern (100 pixels to match) - Cost function: Hamming distance (count mismatched pixels) - Neighborhood: Flip one pixel at a time (100 neighbors per state) **The Algorithm:** ```python while cost > 0 and iterations < max_iterations: neighbors = generate_all_100_neighbors(current_state) best_neighbor = min(neighbors, key=cost_function) if cost(best_neighbor) >= cost(current_state): break # Stuck at local optimum current_state = best_neighbor iterations += 1 ``` **Results:** - Simple pattern (circle): **100% success rate**, avg 147 iterations - Complex pattern (checkerboard): **85% success rate**, gets stuck in local optima - Stochastic sampling (50 neighbors): **95% success**, 2x faster **The Insight:** Hill climbing works beautifully on smooth cost landscapes but fails on complex ones. **This limitation explains why modern AI uses:** - **Simulated annealing** — allows temporary cost increases (probabilistic acceptance) - **Genetic algorithms** — explores multiple paths simultaneously (population-based) - **Gradient descent with momentum** — escapes shallow local minima (velocity-based) --- ### **3. Stochastic Sampling: The Speed vs. Accuracy Trade-Off** One enhancement I added beyond requirements was **stochastic hill climbing**: Instead of evaluating all 100 neighbors, randomly sample 50. **Trade-offs:** - ⚡ **Speed:** 2x faster per iteration - ⚠️ **Accuracy:** May miss optimal move 5% of the time - 📊 **Final cost:** Avg 0.5 pixels worse than full evaluation **Real-world application:** When you have 10,000 neighbors (e.g., 100×100 image), evaluating all is impractical. Stochastic sampling becomes mandatory. --- ## KPIs For the hill climbing implementation, I tracked: | Metric | Simple Pattern | Complex Pattern | |--------|---------------|-----------------| | **Initial Cost** | ~50 mismatched pixels | ~50 mismatched pixels | | **Final Cost** | 0 (perfect) | 0-8 (may get stuck) | | **Iterations** | ~147 | ~500 (hits plateau limit) | | **Time** | <0.03s | <0.2s | | **Neighbors Evaluated** | ~14,700 | ~50,000 | | **Success Rate** | 100% | 85% | **Key Takeaway:** Problem structure matters more than algorithm sophistication. A simple greedy search beats complex methods on convex problems. --- ## 💥 Insights This project transformed my understanding of AI math: | Before | After | |--------|-------| | "Eigenvalues are λ where det(A - λI) = 0" (memorized formula) | Built cofactor expansion recursively, **saw** how determinants break down | | "Gradient descent minimizes loss" (vague intuition) | Computed RRBF gradients by hand, **felt** the chain rule propagate | | "Hill climbing gets stuck in local optima" (heard in lectures) | Watched hill climbing fail on checkerboards, **understood** why cost landscape matters | This transformation from abstract concepts to concrete understanding has fundamentally changed how I approach AI problems: I now see the math not as a collection of formulas, but as a toolkit of interconnected ideas that I can manipulate and reason about directly. The hands-on experience has given me a deep, intuitive grasp of the mathematical foundations that underpin modern AI, enabling me to approach complex problems with both confidence and clarity, and to think about optimization and machine learning as **algorithms to apply** and **mathematical principles** that I can understand and leverage in practice. --- ## ❓ What's Next for EigenAI? **Module 6 introduced three optimization paradigms:** - ✅ **Hill Climbing** (implemented in Assessment 3) - 🕐 **Simulated Annealing** (probabilistic escape from local optima) - 🕐 **Genetic Algorithms** (population-based evolutionary search) **Upcoming v0.4.X+ features:** **Enhanced Optimization Suite:** - Simulated Annealing comparison (temperature schedules, acceptance probability) - Genetic Algorithm variant (crossover, mutation, selection operators) - A* Search for pathfinding (admissible heuristics) - Q-Learning demo (reinforcement learning basics) **Platform Enhancements:** - **Authentication** — user login and progress tracking - **LLM Integration** — GPT-4 powered step-by-step explanations with rate limiting - **Custom Agent Framework** — Built from the ground-up using knowledge graphs and reasoning for problem-solving - **Supabase BaaS** — cloud storage for user data and solutions - **Backend Framework** — FastAPI or Flask for RESTful API - **Weekly Digest** — agentic integration for learning analytics - **Test Coverage** — comprehensive unit testing with pytest - **Security Enhancements** — input sanitization, HTTPS enforcement --- ## Try It Out If you want to explore EigenAI: - **🌍 Live Demo:** [eigen-ai.streamlit.app](https://eigen-ai.streamlit.app/) - 📋 [Assessment 2A, S1P1, Determinants, Reflective Report](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem1/MFA501_Assessment2_Set1Problem1_report_Faria_Luis.pdf) - 📹 [Assessment 2A, S1P1, Determinants, Video Demo](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem1/MFA501_Assessment2_Set1Problem1_video_Faria_Luis.mp4) - 📋 [Assessment 2A, S1P2, Eigenvalues, Reflective Report](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem2/MFA501_Assessment2_Set1Problem2_report_Faria_Luis.pdf) - 📹 [Assessment 2A, S1P2, Eigenvalues, Video Demo](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem2/MFA501_Assessment2_Set1Problem2_video_Faria_Luis.mp4) - 📋 [Assessment 2B, S2P1, Integrals, Reflective Report](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem1/MFA501_Assessment2B_Set1_report_Faria_Luis.pdf) - 📹 [Assessment 2B, S2P1, Integrals, Video Demo](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem1/MFA501_Assessment2B_Set1_demo_Faria_Luis.mp4) - 📋 [Assessment 2B, S2P2, RRBF, Reflective Report](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem2/MFA501_Assessment2B_Set2_report_Faria_Luis.pdf) - 📹 [Assessment 2B, S2P2, RRBF, Video Demo](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem2/MFA501_Assessment2B_Set2_demo_Faria_Luis.mp4) - 📋 [Assessment 3, Hill Climbing, Reflective Report](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment3/Set3Problem1/MFA501_Assessment3_report_Faria_Luis.pdf) - 📹 [Assessment 3, Hill Climbing, Video Demo](https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment3/Set3Problem1/MFA501_Assessment3_demo_Faria_Luis.mp4) - 🤖 [EigenAi Source Code](https://github.com/lfariabr/eigenAi/tree/master) --- ## Let's Connect! Building EigenAI has been the perfect bridge between mathematical theory and practical software engineering. If you're: - Learning AI/ML foundations - Building educational tools - Passionate about making math accessible - Interested in optimization algorithms I’d love to connect: - **LinkedIn:** [linkedin.com/in/lfariabr](https://www.linkedin.com/in/lfariabr/) - **GitHub:** [github.com/lfariabr](https://github.com/lfariabr) - **Portfolio:** [luisfaria.dev](https://luisfaria.dev) --- ## References & Further Reading **Academic Sources:** - Strang, G. (2016). *Introduction to linear algebra* (5th ed.). Wellesley-Cambridge Press. - Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep learning*. MIT Press. - Nocedal, J., & Wright, S. (2006). *Numerical optimization*. Springer. **Project Tech:** - [Streamlit Documentation](https://docs.streamlit.io/) - [SymPy Symbolic Math](https://docs.sympy.org/) - [Python Type Hints (PEP 484)](https://peps.python.org/pep-0484/) --- **Tags:** #machinelearning #python #streamlit #ai #mathematics #optimization #hillclimbing #education --- *Built with ☕ and calculus by [Luis Faria](https://luisfaria.dev) Student @ Torrens University Australia | MFA501 | Dec 2025*

PythonStreamlitAIMachine LearningMathematicsLearning

02/12/2025

GitHub View Details

Project

From Groomzilla to Full-Stack Engineer: Building Wedstack

Have you ever imagined being a Software Engineer, Project Manager, Designer, Customer Support agent, and Husband of the main stakeholder (**aka the Bride**) — all at once? **Yep, you read that right.** That was me for the past 4 weeks, during the conception, development, and deployment of **Wedstack** - a modern wedding invitation app I built from scratch. --- ## What is Wedstack? **Wedstack** is a wedding invitation web application built with the following stack: - **Next.js + TypeScript** for the frontend - **Node.js + GraphQL** on the backend - **MongoDB** for persistence - **Stripe** for handling gifts and payments The app features: - Real-time presence confirmation - A guestbook for leaving messages - A gift store with seamless checkout (Apple Pay, PayID, Pix, credit cards) - A dynamic, playful UI with **shadcn/ui** components - Multi-language toggle between **Portuguese and English** - Deployment experiments on AWS and DigitalOcean --- ## Inspirations Before diving into the keyboard, I explored some wedding-tech platforms that impressed me in the past with custom gifting and messaging features, such as [iCasei](https://www.icasei.com.br/) and [sayI.do](https://sayi.do/). But being the Software Engineer that I am (you can also call it stubbornness, haha), I refused to pay for an off-the-shelf solution. Instead, I decided this was the perfect chance to: - Showcase my full-stack skills - Learn Stripe’s API in depth (first time going live with it) - Craft more custom React components - Impress the main stakeholder (the Bride 👰)!!! --- ## Features Bucket List - ✅ Leave a message to the couple (stored in MongoDB) - ✅ Stripe integration to receive gifts - ✅ Confirm presence via a friendly widget - ✅ Page with all wedding details - ✅ Menu preview with design-matched layout - ✅ Dynamic SVG backgrounds per section - ✅ Soft animations and transitions with **shadcn** - ✅ Multi-language support (en.json / pt.json) - ✅ Page navigation arrows (infinite carousel style) - ✅ AWS Beanstalk deployment trial (testing my [AWS Solutions Architect journey](https://dev.to/lfariaus/scaling-fastier-my-aws-solutions-architect-journey-with-forage-challenge-30j8)) --- ## What Went Well - MongoDB integration was smooth. - Dynamic backgrounds in Next.js were super fun. - The **multi-language switcher** turned out resilient and scalable. - Stripe integration was awesome: I created 4 products (gifts), wired Apple Pay, PayID, and Pix - guests just tap and pay. --- ## Lessons Learned (***aka scars earned***) - **12+ hours invested on AWS Beanstalk** → load balancers, Nginx, configs. Ended up retreating to the comfort zone: a $12/month DigitalOcean Droplet. - **Layout churn**: multiple last-minute changes requested by the Bride (the “Main Stakeholder”... can't say no to that, right?). Next time: Figma first! - Didn’t have time to integrate CRM-style automation (auto-email/WhatsApp after confirmation). Would have been a nice “wow factor.” --- ## Version History | Version | Feature Highlights | |---------|-------------------| | **1.x** | Project setup + MongoDB integration | | **2.x** | Guests module + Dockerization | | **3.x** | Language switcher PT/EN + AWS deployment trials | | **4.x** | Stripe integration + DigitalOcean Go Live | 👉 [Full changelog here](https://github.com/lfariabr/wedstack/blob/master/_docs/notesWedstack.md). --- ## Open Source & Demo - **Code**: [github.com/lfariabr/wedstack](https://github.com/lfariabr/wedstack) - **Live site**: [weddingln.com](https://weddingln.com) Drop us a message, confirm your presence, or (if you’re generous) buy us a gift via our transparent Stripe checkout. After multiple stakeholder meetings, design debates, and countless “scope creep” requests, the bride is happy with the final product. Which means… I’m not sleeping on the couch (for now). --- ## Final Thoughts This project stretched me across roles: engineer, designer, PM, ops, support, and husband. But it was worth every late night. I learned Stripe end-to-end, polished my Next.js chops, wrestled with AWS, and delivered a product that my toughest client to date (my wife) is actually proud of. Sometimes the best side projects aren’t SaaS clones or hackathon demos. They’re personal, high-stakes apps with real users who care. And in my case, the **user was the Bride**. Failure was not an option!!!

Next.jsTypescriptNode.jsGraphQLMongoDBStripe

17/08/2025

GitHub View Details

Project

Engineering Principles Applied to Daily Life: Concierge Edition

> “What's easy to go through is hard to talk about, and what's hard to go through is easy to talk about.” > *- Ariano Suassuna* (Translated from PT to EN) One of the most unexpected things about my journey toward Australian Permanent Residency is how much I've been learning **outside** of Software Engineering and interestingly enough, how much of that learning is still *engineering at heart*. ## Context: From Fullstack Dev to Front Desk Ops While grinding toward the PR dream, one of the roles I’ve taken on is a **concierge**. Not glamorous, but definitely meaningful. I sit behind a desk, wearing a tie and a smile, handling dozens of tasks: - Booking lifts and loading docks; - Organizing and delivering parcels (sometimes to doors!); - Managing keys for contractors; - Logging what goes out, what comes in; - Running admin across multiple platforms. It’s an intense mix of **customer service, logistics, and situational awareness**, and it runs on *systems*. Now here's the twist: between service calls and package handovers, I’ve been **coding like a beast** in anonymous browser tabs. Like that dog that keeps drinking water even when full, that’s how hungry I am for progress, you know? ## The Problem: A Monster Spreadsheet At one building, I inherited a **confusing spreadsheet**. Take a look at it below: ![Building Spreadsheet](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lqxhf4v9tkstmjkwwrif.png) It was the system to track parcel delivery instructions, but it was chaotic. It took time as it required *reading between lines*. New staff had no clue what to do unless they memorized the whole thing or invested a considered time of the day to map the instructions to the apartments. > Residents asking: “Why hasn’t my parcel been delivered?” > Staff asking: “Am I even supposed to deliver this?” The sort of problem that could be solved with engineering. ## The Solution: Streamlit + Logic So I built a tool. A simple, open-source **Streamlit web app** that: - Receives an apartment number - Fetches the relevant delivery instructions - Tells the Concierge on duty exactly what to do Here’s the logic breakdown (Python snippet): ```python if "deliver to door" in notes_clean: st.success(f"🚪 Deliver to door. Use {lift} for apartment {apt}") elif "deliver if requested" in notes_clean: st.info(f"📦 Deliver if requested. Use {lift} for apartment {apt}") elif "store package and send notification" in notes_clean: st.info(f"✅ Store and notify. Use {lift} for apartment {apt}") elif notes: st.info(f"ℹ️ Note: {notes}. Use {lift} for apartment {apt}") ``` From confusion to clarity — in a single input field. **That’s engineering.** ## Outcome & Vision The app is already deployed at one site and in daily use by concierges. Staff love it. Residents get faster service. Managers don’t need to train people on convoluted spreadsheets anymore. This was a **small win**, but when reflecting on it, it really showed me something bigger: **Every problem is a system problem. Every system can be improved.** ## Try it Out - 🖥 [Live Demo (Streamlit)](https://excelbm-swharf.streamlit.app/) - 💻 [Open Source Code (GitHub)](https://github.com/lfariabr/concierge) Thanks for reading this article. And remember: you don’t need a tech title to act like an Engineer. Build things. Solve problems. Share with the world. 🇦🇺🦘🔥

PythonStreamlitPandasNumpyMatplotlibAutomation

08/08/2025

GitHub View Details

Project

ClinicTrends AI - Transforming Customer Feedback Into Intelligent Insights

As I transition from over 10 years in project management into life as a self-made software engineer, I’ve encountered a golden opportunity: designing and building a complete software project from scratch. One of the core subjects I’m currently studying is *Software Engineering Principles* at Torrens University Australia, under the guidance of Dr. Ranju Mandal — a distinguished lecturer in Cybersecurity and a member of the Centre for Artificial Intelligence and Optimisation. With more than six years of postdoctoral research in AI and Big Data Analytics, Dr. Mandal brings invaluable expertise to both our coursework and the ClinicTrends AI project. If that scenario doesn’t give you goosebumps… I’m not sure what will! --- ## How ClinicTrends AI Was Born During my previous career, I worked with a dataset of over one million customer records, including more than three years of NPS-style survey responses from multiple aesthetic clinics I helped manage. That dataset included over 25,000 individual responses. The more I studied software engineering and machine learning, the more I became obsessed with one question: > **Can we turn raw customer feedback into actionable, real-time intelligence for businesses?** That’s how **ClinicTrends AI** was born — and it’s already progressing into version 2.0! --- ## 🚀 What’s Under the Hood? We just wrapped up **Release 1.0**, which includes: ✅ A modular **Streamlit web application architecture** ✅ Implementation of **four machine learning models** for sentiment analysis: - TF-IDF + Logistic Regression on comments - Comment-score fusion models - Hugging Face Transformers for context-aware predictions ✅ **Multi-language support** with automated translation ✅ **Unit testing** implemented via Pytest --- ## 🔮 What’s Coming Next We’re actively developing **Release 2.0**, which will introduce: - **Real-time alerts** for NPS score drops, enabling early intervention - More **explainable AI** models for transparent sentiment predictions - Enhanced feature engineering and hyperparameter optimization - Future integrations like RESTful APIs and CRM connectivity --- ## 🎯 The Business Problem We’re Solving Small and medium-sized aesthetic businesses often rely on traditional NPS surveys that provide historical snapshots of customer satisfaction. But by the time negative trends are visible, customer churn may already be happening. **ClinicTrends AI** changes the game by delivering: - Real-time sentiment analysis across thousands of survey responses - Predictive analytics to identify at-risk customers before they leave - Multi-model comparisons so business leaders can choose the best-performing ML approach - Cost savings compared to enterprise tools like Medallia or Qualtrics, which can cost over $50,000 annually --- ## 💻 Technical Stack Here’s a quick peek at our tech stack: | Component | Technology | |-------------------|----------------------------------------------------| | Frontend | Streamlit | | ML/NLP | Scikit-learn, Hugging Face Transformers, TextBlob | | Data Processing | pandas, numpy | | Visualizations | Altair, Plotly, matplotlib | | Translation | deep-translator | | Deployment | Streamlit Cloud (Docker-ready) | --- ## 🌐 Want to Explore? If you’d like to check out the app: - **Live Demo:** [ClinicTrends AI Streamlit App](https://sep-torrens-dr-ranju-group-1.streamlit.app/) - **GitHub Repo:** [ClinicTrends AI – GitHub](https://github.com/lfariabr/masters-swe-ai/tree/master/T1-Software-Engineering-Principles/projects/clinictrends_ai) And if you’re curious about how Torrens University splits its trimester and assessments, I explained that in this article: [My Journey at Torrens University](https://luisfaria.dev/articles/6847ef52bc28d0f281fd54f0). --- ## 🚀 Let’s Connect! Building ClinicTrends AI has been the perfect bridge between my project management background and my software engineering future. I’d love to hear from fellow engineers, data scientists, or anyone passionate about transforming customer data into actionable insights. **Feel free to connect with me on [LinkedIn](https://linkedin.com/in/lfariabr) or drop me a message!**

PythonStreamlitScikit LearnMachine LearningTextBlobTransformersHugging Face

03/07/2025

GitHub View Details

Project

How I Turned a 30min Daily Task into a 5min Breeze (Saving 100+ Hours a Year)

It was 2018, Excel was king, and if you could create a pivot table, you were a prodigy. If you knew macros? You were a God! Fun times! 😵 We had this sales reporting beast: A 12-page PDF generated every single day from a 20MB Excel file, bloated with pivot tables and legacy logic. **The job?** Send out the daily performance report for the entire company. ### The manual routine looked like this: 1. Download 5 ERP reports in .xlsx format 2. Clean up the import template in Excel 3. Paste everything into the right places 4. Update 30+ pivot tables 5. Double-check every sheet 6. Manually export the PDF (via Ctrl+Click on sheets) 7. Send it over via email ⏱️ **Average time:** 30 minutes/day 📆 **Annual effort:** 126 hours But here’s the thing: **only one person could do it.** And if they were out? Chaos. --- Fast forward to 2025: the entire process is now automated and I decided to go back at the project while reviewing old notes. It took 3 months of mapping the steps, testing scenarios, and refactoring into a scalable process. By the end of 2021, we launched the new version: 🔁 Same data 🎁 A better looking report 🎉 But 90% less effort --- ### Here’s the new flow: 1. Upload raw files to a Google Drive folder 2. Run a Google Colab script (pandas, gspread, numpy) 3. Data flows into 2 Google Sheets, auto-formatted 4. Linked Google Slides update charts and visuals 5. Click “Update linked objects” → Export PDF → Done. ⏱️ **New time:** 5 minutes/day 📆 **Annual effort:** 21 hours --- 💥 That’s a **105-hour/year time save**. 🧠 No more human errors. 🔁 Easily replicable and documented. 🤖 Zero dependency on Excel wizardry or muscle-memory workflows. I’m still amazed we were spending 15x more time for the exact same result. --- > **Optimization isn’t always about new technology, it’s about smarter flows.** ![Diagram displaying before & after efforts](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkq9j5sxuz19vzze3vvhi.png) --- Have you ever turned a beastly manual process into a 5-minute miracle? #automation #datascience #processengineering #nocode #exceltoai #colab #productivity #growth

ExcelMicrosoftPythonAutomationGrowthGoogleData Science

04/06/2025

View Details

Project

Konquista: From Spreadsheet Chaos to 1,000 WhatsApp Messages a Day

Imagine spending hours daily cleaning phone numbers and sending 1,000 WhatsApp messages by hand. That was my reality last year, juggling lead follow-ups for 20+ clinics. Then I had an idea: **what if I could automate it all in one place?** This is the story of **Konquista** — how I went from manual chaos to a full-fledged SaaS using Google Sheets, Flask, and Django, boosting response rates by 30%. Here’s how it happened. --- ## The Problem We had just onboarded [SocialHub](https://socialhub.pro/) to manage communication across 20+ clinics using multiple WhatsApp numbers via their API. Leads were flowing in — but the process to follow up with those who didn’t schedule was messy: - Extract leads from the partner platform - Clean phone numbers (+55, 11, (11), -, etc.) - Match leads with internal ERP data - Use VLOOKUP in Excel to compare both sides - Send messages manually 😫 It was repetitive, error-prone, and wasted hours every day. --- ## The Insight After a few days of chaos, I asked myself: > **“What if I could centralize and automate everything in one place?”** That’s when **Konquista** was born — first as a hack, then a project, and finally a full-fledged SaaS. --- ## Version 1: Spreadsheet MVP ![Phase 1 flowchart](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cill823oqhwstzadmdfz.jpeg) Built using **Google Sheets + AppScript**: - **Contacts**: name, phone, source, created date - **Messages**: pre-written templates - **Appointments**: ERP data (from today-5 to today+5) - **Filtered Contacts**: VLOOKUP to remove scheduled contacts - **Sent Messages**: log of what was sent and when This worked — until it didn’t. AppScript’s 5-minute runtime limit made things slow and unstable after ~100+ leads. --- ## Version 2: Python + Flask ![Phase 2 flowchart](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/um3w45og6aorccpfh2qx.jpeg) I took the leap, bought a [Flask course on Udemy](https://www.udemy.com/course/python-and-flask-bootcamp-create-websites-using-flask), and wrote my first Flask monolith. **Source Code**: [github.com/lfariabr/konquista](https://github.com/lfariabr/konquista) ### What I used: - **Backend**: Flask + Blueprints - **Frontend**: Jinja2 templates - **Database**: PostgreSQL on Supabase ### Deployment: - Docker + docker-compose (local) - Nginx + Certbot + Gunicorn (production) - Vercel (frontend testing) - DigitalOcean droplet (hosting) Some manual jobs still remained (like clicking “Send All”), but it was a huge step forward. --- ## Version 3: Django + Celery 🚀 ![Phase 3 flowchart](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rjl6tedb4fgi9ff8f418.jpeg) This marked the beginning of something scalable. ### Major improvements: - **Async Queue Processing**: Celery + Redis to send WhatsApp messages in the background, with retries and monitoring - **CRM + ERP Sync**: Real-time sync for leads, appointments, payments — no more spreadsheets - **Campaign Engine**: Schedule recurring campaigns (daily, weekly, one-time) using custom filters - **Smart Templates**: Dynamic messages with `{name}`, `{appointment_time}`, `{unit}`, and media support - **Multi-user & Role Management**: Permissions, grouping, usage tracking, admin panel - **Dockerized Services**: web, worker, beat, redis, postgres, nginx, monitoring - **Monitoring**: Logs, retries, dead-letter queues, basic dashboards --- ## Results Konquista is now used by **20+ clinics**, sending around **1,000 WhatsApp messages daily**: - Welcoming new leads/customers - Sending appointment reminders - Reactivating cold leads (“We miss you!” campaigns) 📈 **30% increase in response rates**, fully automated. --- ## Final Thoughts This journey showed me the real power of software and how a few lines of code can eliminate hours of repetitive work, drive real business results, and scale human effort. --- If you’re building something similar, want to exchange ideas, or just curious about how it all works, feel free to ping me. I’m happy to connect! 🔗 [GitHub Repo](https://github.com/lfariabr/konquest/)

PythonDjangoFlaskCeleryRedisERPCRMDockerAsync

29/05/2025

GitHub View Details